[Tgcmgroup] mss dispose

Ben Foster Ben Foster <foster@hao.ucar.edu>
Wed, 16 Jul 2003 10:08:45 -0600 (MDT)


Hi tgcmgroup:

I have prepared modifications to tiegcm1 and tgcm24 to allow batch jobs 
to dispose history files to the mss in a separate job step after model 
execution rather than during execution.

Normally, when namelist parameter DISPOSE=1, the model disposes history 
files to the mss from the fortran during model execution. This is inefficient, 
because only the master task does the dispose, leaving all other tasks idle. 
Also, if the mss is down or heavily loaded, the model can exceed the wallclock 
limit before finishing the run due to waiting for the mss.

The DISPOSE namelist read parameter can now have one of 3 settings:

; DISPOSE=0 -> do not dispose to mss.
; DISPOSE=1 -> dispose to mss during model execution.
; DISPOSE=2 -> dispose to mss after model execution.
 
To enable the DISPOSE=2 option, you must use the mods in the following
directories, depending on the model you are running:

/home/tgcm/tgcm24/modsrc.dispose  (mods to fixed code in /home/tgcm/tgcm24)
/home/tgcm/tiegcm1/modsrc.dispose (mods to fixed code in /home/tgcm/tiegcm1)

Let me know if you need help using or merging these mods.  You must also use 
a new job script. There are sample job scripts in the above modsrc.dispose 
directories (*.job files). These scripts use 3 job steps as follows:

Step 0: Build and execute the model (parallel, regular queue)
Step 1: Dispose history files to mss if DISPOSE was 2 (serial, share queue)
Step 2: Return output listing (serial, interactive queue)

If the model crashes or times out, the dispose step will still be executed,
correctly disposing the disk files that were written before the crash, so
this method preserves our save/restart capability.

Note that the mss dispose step (2nd step) uses the share queue. The wallclock
limit in this queue is 6 hours. If you are disposing several history files,
set this limit fairly high to allow time for the mss. If the dispose step
exceeds the wallclock limit (or the mss was down), you can go to the execution 
directory (assuming you had POSTCLEAN=0 in the job script), and interactively 
dispose the disk files by executing the dispose.csh shell script. (This script 
was built by the model during execution if DISPOSE was 2). If some of your files 
were disposed before the timeout, you can edit the dispose.csh script 
accordingly.

--Ben


-----------------------------------------------------------------------
Ben Foster		      	High Altitude Observatory (HAO)
foster@ucar.edu			phone: 303-497-1595  fax: 303-497-1589  
Nat. Center for Atmos. Res.     P.O. Box 3000 Boulder CO 80307 USA
-----------------------------------------------------------------------