From bala@llnl.gov Fri Jul 5 20:10:35 2002
From: bala@llnl.gov (Bala Govindasamy)
Date: Fri, 05 Jul 2002 12:10:35 -0700
Subject: [ccm-users] land model gives error message for pe < 16?
Message-ID: <3D25EF2B.3EB6C7F2@llnl.gov>
--------------06011CD5F85641E39D1795B3
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Dear CAM users,
When I run CAM2 on less than 16 IBM processors (e.g. 8, 4) the model
stops running
For example, on 8 proc, I get the following error message:
water balance nstep = 1 point = 4742 imbalance = 468.14 mm
clm model is stopping
ENDRUN IS BEING CALLED
On 4 procs, I get a similar message at nstep =4
But there is no problem on 16, 32 and 64 processors.
Any idea?
Thanks,
--
Bala
----------------------------------------
Bala Govindasamy
L-103, Atmospheric Science Division
Lawrence Livermore National Laboratory
Livermore
CA 94550
Ph.: 925 423 0771
Fax: 925 422 6388
Email: bala@LLNL.GOV; bala_indu@yahoo.com
http://en-env.llnl.gov/cccm/balacv.html
-----------------------------------------
--------------06011CD5F85641E39D1795B3
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
Dear CAM users,
When I run CAM2 on less than 16 IBM processors (e.g. 8, 4) the model
stops running
For example, on 8 proc, I get the following error message:
water balance nstep =
1 point = 4742 imbalance = 468.14 mm
clm model is stopping
ENDRUN IS BEING CALLED
On 4 procs, I get a similar message at nstep =4
But there is no problem on 16, 32 and 64 processors.
Any idea?
Thanks,
--
Bala
----------------------------------------
Bala Govindasamy
L-103, Atmospheric Science Division
Lawrence Livermore National Laboratory
Livermore
CA 94550
Ph.: 925 423 0771
Fax: 925 422 6388
Email: bala@LLNL.GOV; bala_indu@yahoo.com
http://en-env.llnl.gov/cccm/balacv.html
-----------------------------------------
--------------06011CD5F85641E39D1795B3--
From dpierce@ucsd.edu Mon Jul 8 17:29:04 2002
From: dpierce@ucsd.edu (Dave Pierce)
Date: Mon, 8 Jul 2002 09:29:04 -0700 (PDT)
Subject: [ccm-users] Bug in CCSM2
Message-ID:
Hi folks,
there seems to be a bug in ccsm's file
models/atm/cam/src/control/ccsm_msg.F90, around line 1485. Right now the
code looks like this:
#if (defined SPMD)
do n=1,nrcv
do lat=1,plat
arget_buf(:,n,lat) = arget(:,lat,n)
end do
end do
Problem is, arget_buf is only allocated for the master processor. You
might expect this would cause strange errors on some platforms, with
rather hard to trace and non-reproducable results. I think it should
instead be:
#if (defined SPMD)
if ( masterproc ) then
do n=1,nrcv
do lat=1,plat
arget_buf(:,n,lat) = arget(:,lat,n)
end do
end do
endif
Perhaps one of the model coders could verify this conjecture.
Also, in file models/ice/csim4/src/source/ice_itd.F, around line 163, the
original code is like this:
if (my_task.eq.master_task) then
write (6,*) ''
write (6,*) 'hin_max(nc-1) < Cat nc < hin_max(nc)'
For some reason (probably compiler bug) this causes a failure on the PGI
compilers version 3.2-4 (haven't tried the version 4 compilers yet). What
happens is that the ice model halts with an I/O (permission denied) error
to the output file. It works if you instead have:
if (my_task.eq.master_task) then
write (6,*) ' '
write (6,*) 'hin_max(nc-1) < Cat nc < hin_max(nc)'
Note that the difference is writing a single space to the output file in
the second line, rather than a null string.
Regards,
--Dave
---------------------------------------------------------------
David W. Pierce / Climate Research Division
Scripps Institution of Oceanography / (858) 534-8276 (voice)
dpierce@ucsd.edu / (858) 534-8561 (fax)
---------------------------------------------------------------
From erik@ucar.edu Thu Jul 11 20:30:49 2002
From: erik@ucar.edu (Erik Kluzek)
Date: Thu, 11 Jul 2002 13:30:49 -0600 (MDT)
Subject: [ccm-users] Bug in CCSM2
In-Reply-To:
Message-ID:
On Mon, 8 Jul 2002, Dave Pierce wrote:
>
> there seems to be a bug in ccsm's file
> models/atm/cam/src/control/ccsm_msg.F90, around line 1485. Right now the
> code looks like this:
>
> #if (defined SPMD)
> do n=1,nrcv
> do lat=1,plat
> arget_buf(:,n,lat) = arget(:,lat,n)
> end do
> end do
>
> Problem is, arget_buf is only allocated for the master processor. You
> might expect this would cause strange errors on some platforms, with
> rather hard to trace and non-reproducable results. I think it should
> instead be:
>
> #if (defined SPMD)
> if ( masterproc ) then
> do n=1,nrcv
> do lat=1,plat
> arget_buf(:,n,lat) = arget(:,lat,n)
> end do
> end do
> endif
>
> Perhaps one of the model coders could verify this conjecture.
>
Yes, the above is a recognized problem in ccsm_msf.F90. It will be fixed
in the CAM2.0.1 and CCSM2.0.1 release which is scheduled for later this
month.
> Also, in file models/ice/csim4/src/source/ice_itd.F, around line 163, the
> original code is like this:
>
> if (my_task.eq.master_task) then
> write (6,*) ''
> write (6,*) 'hin_max(nc-1) < Cat nc < hin_max(nc)'
>
> For some reason (probably compiler bug) this causes a failure on the PGI
> compilers version 3.2-4 (haven't tried the version 4 compilers yet). What
> happens is that the ice model halts with an I/O (permission denied) error
> to the output file. It works if you instead have:
>
> if (my_task.eq.master_task) then
> write (6,*) ' '
> write (6,*) 'hin_max(nc-1) < Cat nc < hin_max(nc)'
>
> Note that the difference is writing a single space to the output file in
> the second line, rather than a null string.
>
I'll report this the CSIM folks. Obviously it's something simple to
fix...
Erik Kluzek, (CGD at NCAR)
National Center for Atmospheric Research
Boulder CO, (off) (303)497-1326 (fax) (303)497-1324
--------- Home page and public PGP key---------------
http://www.cgd.ucar.edu/~erik
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
From erik@ucar.edu Thu Jul 11 20:54:56 2002
From: erik@ucar.edu (Erik Kluzek)
Date: Thu, 11 Jul 2002 13:54:56 -0600 (MDT)
Subject: [ccm-users] Moving CCM-users over to cam-users...
Message-ID:
All
I will be moving everyone that is currently on
ccm-users@ucar.edu over to the new CAM users e-mail
list "cam-users@ucar.edu". Messages regarding both
CAM, and the CCM are being sent to both lists, and I
think it would be cleaner (and easier for me) to have a single
list to manage. Messages regarding the CCM can still
be sent to the "cam-users" list. Once, everyone is moved
over I'll inactivate the "ccm-users" list, and disallow
anyone from signing on to it. The CCM3 web-page will
also refer to the CAM-users e-mail list for new
questions.
If you don't want to be on the ccm-users
list, either unsubscribe by going to
http://mailman.ucar.edu/mailman/listinfo/cam-users/
or send me e-mail and I'll take you off the list.
Thanks
Erik Kluzek, (CGD at NCAR)
National Center for Atmospheric Research
Boulder CO, (off) (303)497-1326 (fax) (303)497-1324
--------- Home page and public PGP key---------------
http://www.cgd.ucar.edu/~erik
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
From erik@ucar.edu Thu Jul 11 20:58:35 2002
From: erik@ucar.edu (Erik Kluzek)
Date: Thu, 11 Jul 2002 13:58:35 -0600 (MDT)
Subject: [ccm-users] ccm-users and cam-users now unmoderated for list subscribers...
Message-ID:
All
In the past the ccm-users list was moderated to
eliminate spam. I've now opened up the list to allow
messages from list-members. If we start having problems
with inappropriate messages or spam on the list again,
I'll lock it down again.
Thanks
Erik Kluzek, (CGD at NCAR)
National Center for Atmospheric Research
Boulder CO, (off) (303)497-1326 (fax) (303)497-1324
--------- Home page and public PGP key---------------
http://www.cgd.ucar.edu/~erik
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
From erik@ucar.edu Thu Jul 11 21:07:16 2002
From: erik@ucar.edu (Erik Kluzek)
Date: Thu, 11 Jul 2002 14:07:16 -0600 (MDT)
Subject: [ccm-users] land model gives error message for pe < 16?
In-Reply-To: <3D25EF2B.3EB6C7F2@llnl.gov>
Message-ID:
On Fri, 5 Jul 2002, Bala Govindasamy wrote:
>
> When I run CAM2 on less than 16 IBM processors (e.g. 8, 4) the model
> stops running
>
> For example, on 8 proc, I get the following error message:
> water balance nstep = 1 point = 4742 imbalance = 468.14 mm
> clm model is stopping
> ENDRUN IS BEING CALLED
>
> On 4 procs, I get a similar message at nstep =4
>
> But there is no problem on 16, 32 and 64 processors.
>
Bala
I just did some simple tests with 8 processors (2 nodes and
4 threads each) -- it worked for me. Can you send more specifics?
Your config_cache.xml file, namelist, and commands you are using
to invoke the executable (environment variables, and poe command line),
would be helpful in reproducing the problem. Also any specifics on
the machine you are running on might be useful, the machinename,
compiler version, and OS version.
Erik Kluzek, (CGD at NCAR)
National Center for Atmospheric Research
Boulder CO, (off) (303)497-1326 (fax) (303)497-1324
--------- Home page and public PGP key---------------
http://www.cgd.ucar.edu/~erik
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!