[Dart-dev] DART/branches Revision: 13021

dart at ucar.edu dart at ucar.edu
Tue Mar 19 09:03:58 MDT 2019


raeder at ucar.edu
2019-03-19 09:03:58 -0600 (Tue, 19 Mar 2019)
1010
Accelerated cycles by turning off CESM's timing evaluation
  for all cycles after the first cycle of each job.
Removed -v option from locally defined commands (COPY, ...).
  Add it to each use of the commands where it is needed.
Moved the Hide directory, parsing of forecast time, and definition of log_list
  closer to the beginning.  Hide is used to protect needed files from removal 
  by the disk management and undesired archiving by st_archive.
Accommodated file names which have a compression extension (e.g. '.gz')
Removed renaming of inflation files written out by filter.
  They're now correct in fill_inflation_restart.
In the run_shadow directory (used for debugging st_archive),
  create minimal component restart files, which have just the
  "restart history" file names in them.
Renamed TASKS_PER_NODE NUMTASKS_PERNODE and set it automatically.
More, better exits.
Improved comments.

This has been reviewed and tested in the Reanalysis;
f.e21.FHIST_BGC.f09_025.CAM6assim.001  (Jan-June 2017).




Modified: DART/branches/recam/models/cam-fv/shell_scripts/cesm2_1/assimilate.csh.template
===================================================================
--- DART/branches/recam/models/cam-fv/shell_scripts/cesm2_1/assimilate.csh.template	2019-03-18 23:26:12 UTC (rev 13020)
+++ DART/branches/recam/models/cam-fv/shell_scripts/cesm2_1/assimilate.csh.template	2019-03-19 15:03:58 UTC (rev 13021)
@@ -6,20 +6,40 @@
 #
 # DART $Id$
 
-# ---------------------
-# Purpose
-# ---------------------
+# ------------------------------------------------------------------------------
+# Purpose: assimilate with a CAM ensemble and perform advanced archiving
+#          and compression in support of multiple assimilation cycles in a
+#          single CESM job.
+#
+# The (resulting) assimilate.csh script is called by CESM with two arguments:
+# 1) the CASEROOT, and
+# 2) the assimilation cycle number in this CESM job
+# ------------------------------------------------------------------------------
 # This template is lightly modified by the setup scripts to be appropriate
 # for specific hardware and other configurations. The modified result is
 # then given execute permission and is appropriate to use for an assimilation.
 # All of this is automatically performed by the DART-supplied setup scripts.
-# 
-# Tag DART's state output with names using CESM's convention:  
-#    ${case}.${scomp}[_$inst].${filetype}[.$dart_file].${date}.nc 
+#
+# Tag DART's state output with names using CESM's convention:
+#    ${case}.${scomp}[_$inst].${filetype}[.$dart_file].${date}.nc
 #    These should all be named with $scomp = "cam" to distinguish
 #    them from the same output from other components in multi-component assims.
+#
+# This script also has logic in it to manage disk space in a way that allows
+# for more assimilation cycles to be performed before archiving without losing
+# critical restart capability. The same logic is also useful for assimilations
+# that may require multiple timesteps to be available.
+#
+# As a specific example, consider the case when 3 assimilation cycles have been
+# performed: 6Z, 12Z, 18Z.
+# If we want to keep a restart set and a backup
+# restart set, we only need the 18Z and 12Z, so the 6Z set can be removed.
+# Let's also say that its the last cycle of job - which automatically kicks off
+# the short-term archiver. If we did 'nothing', the 12Z and 18Z get archived
+# and the 18Z gets restaged
 
 # machine-specific dereferencing
+
 if ($?SLURM_JOB_ID) then
 
    # SLURM environment variables:
@@ -30,7 +50,10 @@
    setenv       JOBID $SLURM_JOBID
    setenv     MYQUEUE $SLURM_JOB_PARTITION
    setenv   NODENAMES $SLURM_NODELIST
-   setenv   LAUNCHCMD "mpirun -np $SLURM_NTASKS -bind-to core"
+   setenv LAUNCHCMD "mpirun -np $SLURM_NTASKS -bind-to core"
+#  untested method for determining NUMTASKS_PERNODE with SLURM
+#  set ANY_OLD_NODE = `head -n 1 $SLURM_NODELIST`
+#  setenv NUMTASKS_PERNODE `grep $ANY_OLD_NODE $SLURM_NODELIST | wc -l`
 
 else if ($?PBS_NODEFILE) then
 
@@ -44,14 +67,18 @@
    setenv     NUMCPUS $NCPUS
    setenv    NUMTASKS `cat  $PBS_NODEFILE | wc -l`
    setenv    NUMNODES `uniq $PBS_NODEFILE | wc -l`
-   setenv MPIEXEC_MPT_DEBUG 0
+   set ANY_OLD_NODE = `head -n 1 $PBS_NODEFILE`
+   setenv    NUMTASKS_PERNODE `grep $ANY_OLD_NODE $PBS_NODEFILE | wc -l`
+   setenv  MPIEXEC_MPT_DEBUG 0
    setenv MP_DEBUG_NOTIMEOUT yes
-   setenv   LAUNCHCMD mpiexec_mpt
+   setenv          LAUNCHCMD mpiexec_mpt
 
-   echo "jobname  :  $JOBNAME"
-   echo "numcpus  :  $NUMCPUS"
-   echo "numtasks :  $NUMTASKS"
-   echo "numnodes :  $NUMNODES"
+   echo "jobname        : $JOBNAME"
+   echo "numcpus        : $NUMCPUS"
+   echo "numtasks       : $NUMTASKS"
+   echo "numnodes       : $NUMNODES"
+   echo "tasks_per_node : $NUMTASKS_PERNODE"
+   echo " "
 
 else if ($?LSB_HOSTS) then
 
@@ -64,25 +91,29 @@
    setenv     MYQUEUE $LSB_QUEUE
    setenv   NODENAMES ${LSB_HOSTS}
    setenv MP_DEBUG_NOTIMEOUT yes
-   setenv  LAUNCHCMD   mpirun.lsf
+   setenv LAUNCHCMD mpirun.lsf
+#  untested method for determining NUMTASKS_PERNODE with LSF
+#  setenv NUMTASKS_PERNODE \
+#     `echo $LSB_SUB_RES_REQ | sed -ne '/ptile/s#.*\[ptile=\([0-9][0-9]*\)]#\1#p'`
 
 endif
 
-#=========================================================================


More information about the Dart-dev mailing list