[Dart-dev] [7735] DART/trunk/models/cam/shell_scripts: new scripts which run multiple forecast cycles in a single

nancy at ucar.edu nancy at ucar.edu
Mon Mar 16 14:54:24 MDT 2015


Revision: 7735
Author:   nancy
Date:     2015-03-16 14:54:24 -0600 (Mon, 16 Mar 2015)
Log Message:
-----------
new scripts which run multiple forecast cycles in a single
batch job.

Added Paths:
-----------
    DART/trunk/models/cam/shell_scripts/README
    DART/trunk/models/cam/shell_scripts/archive_cycles.csh
    DART/trunk/models/cam/shell_scripts/submit_cycles.csh

-------------- next part --------------
Copied: DART/trunk/models/cam/shell_scripts/README (from rev 7729, DART/branches/cam/models/cam/shell_scripts/README)
===================================================================
--- DART/trunk/models/cam/shell_scripts/README	                        (rev 0)
+++ DART/trunk/models/cam/shell_scripts/README	2015-03-16 20:54:24 UTC (rev 7735)
@@ -0,0 +1,92 @@
+# DART software - Copyright 2004 - 2013 UCAR. This open source software is
+# provided by UCAR, "as is", without charge, subject to all terms of use at
+# http://www.image.ucar.edu/DAReS/DART/DART_download
+#
+# DART $Id$
+
+This README describes how to use the scripts in this directory
+to set up a CESM (active atmospheric component) assimilation.
+
+This includes changes to CESM and DART scripts which enable 
+multiple CAM forecasts and assimilations ("cycles") in a single 
+LSF job on yellowstone.  This reduces the time spent waiting 
+in the queue by a large percentage.  It also moves the short 
+term archiver tasks into a separate, single-task job, which 
+saves ~30% of the core hours used in the standard workflow mode 
+and saves even more by eliminating most of the copies that are 
+in st_archive.csh.  
+
+The "traditional", single-cycle-per-job way of running CESM+DART is to
+1) Build the DART executables in ../work, as described elsewhere.
+2) Set assimilation parameters in ../work/input.nml according to your needs,
+   as described elsewhere.
+3) Set set-up parameters in CESM{version}_setup_hybrid and locate the files
+   referenced in that script (obs_seq, initial ensemble, ...),
+   as described in that script.
+4) Run CESM{version}_setup_hybrid interactively to set up the 
+   $CASEROOT and $RUNDIR directories and put/modify all the 
+   necessary scripts and other files there.
+5) Follow directions printed at the end of the CESM_2_1_setup_hybrid run.
+6) Run $CASE.submit to submit a single cycle job with 
+      CONTINUE_RUN = FALSE
+7) Change 
+      CONTINUE_RUN = TRUE
+      RESUBMIT = as many cycles as desired.
+   $CASE.run will resubmit itself to the queue RESUBMIT times.
+8) Short- and/or long-term archiving may be done at the end of each cycle,
+   as set in env_run.xml when any job finishes.
+
+The new, multicycle way of running CESM+DART uses a different 
+workflow starting with 7).  Instead of $CASE.run resubmitting 
+itself to the queue RESUBMIT times, the variant script 
+$CASE.run_cycles will recursively run itself RESUBMIT times 
+within a single job.  RESUBMIT will be decremented as usual
+at the end of each assimilation cycle, and will be -1 at the end
+of the job.
+
+There is a new submit script as well; $CASE.submit_cycles 
+("submit_cycles.csh" in this directory),
+which is run interactively in $CASEROOT, just like 
+$CASE.submit in the standard CESM workflow.
+It can submit any number of multicycle jobs ($CASE.run_cycles), 
+each dependent on the previous one.  These are submitted in groups,
+separated by single processor archiving jobs, 
+to prevent the disk from filling up.  
+The user organizes this series of jobs in $CASE.submit_cycles by 3 parameters.
+   RESUBMIT = the number of cycles which will fit in the wall clock limit, minus 1
+   jobs_loop = the number of multi-cycle jobs which can be run
+               before the disk fills up
+   archive_loop = the outermost loop; the number of times archive_cycles.csh
+                  will be run
+So the total number of assimilation cycles will be
+($RESUBMIT + 1) * jobs_loop * archive_loop.
+So,
+  7) Edit $CASE.submit_cycles 
+  8) Run $CASE.submit_cycles interactively
+
+
+
+Another way to describe the multi-cycle workflow follows.
+After the new CESM{version}_setup_cycles and CESM_DART_config scripts 
+set up the case, the calling tree of these scripts is
+-> $CASE.submit_cycles                  (run this interactively in $CASEROOT)
+   -> $CASE.run_cycles #1               (created by CESM{version}_setup_cycles)
+      -> the next $CASE.run_cycles      (NOT as a batch job; a recursion)
+         ... repeats until RESUBMIT has been reduced to 0.
+
+  [-> $CASE.run_cycles #2,....]         (after previous set of $CASE.run_cycles finishes)
+                                        (Resets RESUBMIT to initial value)
+   -> archive_cycles.csh                (waits for the series of $CASE.run_cycles)
+      archives selected restarts
+      archives selected history files
+      -> lt_archive.sh -m copy_dirs_hsi
+
+   -> $CASE.run_cycles #($jobs_loop +1)   (waits until the archive_cycles.csh is done)
+      ... 
+
+
+# <next few lines under version control, do not edit>
+# $URL$
+# $Revision$
+# $Date$
+

Copied: DART/trunk/models/cam/shell_scripts/archive_cycles.csh (from rev 7729, DART/branches/cam/models/cam/shell_scripts/archive_cycles.csh)
===================================================================
--- DART/trunk/models/cam/shell_scripts/archive_cycles.csh	                        (rev 0)
+++ DART/trunk/models/cam/shell_scripts/archive_cycles.csh	2015-03-16 20:54:24 UTC (rev 7735)
@@ -0,0 +1,214 @@
+#!/bin/csh -f
+#
+# DART software - Copyright 2004 - 2013 UCAR. This open source software is
+# provided by UCAR, "as is", without charge, subject to all terms of use at
+# http://www.image.ucar.edu/DAReS/DART/DART_download
+#
+# DART $Id$
+
+# This script is used to archive DART and CESM files when running multiple
+# CESM/DART cycles in a single batch job.  See the README file for more details.
+
+#BSUB -o archive_cycles.%J
+#BSUB -e archive_cycles.%J
+#BSUB -W 11:00
+#BSUB -N 
+#BSUB -q caldera
+#BSUB -n 1
+#BSUB -P ZZZZZZZZ
+#BSUB -J archive_cycles.csh
+
+# ==============================================================================
+# Load environment variables from CESM 
+# ==============================================================================
+
+cd BOGUS_CASE
+
+source ./Tools/ccsm_getenv || exit -1
+
+# Set the archive frequency for restart sets. The others will be removed.
+set archive_Nth_days = 4
+
+# ==============================================================================
+# standard commands:
+# 
+# If you are running on a machine where the standard commands are not in the
+# expected location, add a case for them below.
+# ==============================================================================
+
+set nonomatch       # suppress "rm" warnings if wildcard does not match anything
+
+# The FORCE options are not optional.
+# The VERBOSE options are useful for debugging though
+# some systems don't like the -v option to any of the following
+switch ("`hostname`")
+   case be*:
+      # NCAR "bluefire"
+      set   MOVE = '/usr/local/bin/mv -fv'
+      set   COPY = '/usr/local/bin/cp -fv --preserve=timestamps'
+      set REMOVE = '/usr/local/bin/rm -fr'
+      set   LINK = '/usr/local/bin/ln -fvs'
+      breaksw
+
+   default:
+      # NERSC "hopper", NWSC "yellowstone"
+      set   MOVE = '/bin/mv -fv'
+      set   COPY = '/bin/cp -fv --preserve=timestamps'
+      set REMOVE = '/bin/rm -fr'
+      set   LINK = '/bin/ln -fvs'
+      breaksw
+endsw
+
+# ==============================================================================
+# Use the CESM coupler files to make a file list; options are to
+#  delete them, archive them, or leave them in place. 
+# ==============================================================================
+
+cd $RUNDIR
+set most_recent = `ls -1t *cpl*.r* | head -1`
+set most_recent_date =  `echo $most_recent | sed "s/\.nc//; s/^.*\.r\.//;"`
+echo THE MOST RECENT DATE, $most_recent_date, RESTARTS WILL be saved 
+
+foreach cplfile ( *cpl*.r* )
+
+   # Extract the date from the cpl restart file name
+   set date = `echo $cplfile | sed "s/\.nc//; s/^.*\.r\.//;"`
+   
+   if ($date != $most_recent_date) then
+      set temp = `echo $date | sed -e "s#-# #g"`
+      set day = $temp[3]
+      set secs = $temp[4]
+      if ( $secs == 00000 && (`expr $day % $archive_Nth_days` == 0 || $day == 28) ) then
+
+          echo $date : archiving files from this date
+          if ( -e $DOUT_S_ROOT/rest/${date} ) then
+             echo directory already exists
+          else
+             echo creating directory
+             mkdir -p $DOUT_S_ROOT/rest/${date}
+          endif
+          $MOVE *.r*.${date}*                  $DOUT_S_ROOT/rest/${date}
+          $MOVE *.i.${date}*                   $DOUT_S_ROOT/rest/${date}
+          $MOVE *prior_inflate_restart.${date} $DOUT_S_ROOT/rest/${date}
+          $MOVE *post_inflate_restart.${date}  $DOUT_S_ROOT/rest/${date}
+          $COPY *cam*.h*.${date}*              $DOUT_S_ROOT/rest/${date}
+          $COPY *clm*.h*.${date}*              $DOUT_S_ROOT/rest/${date}
+      else
+         echo $date : deleting files from this date
+         $REMOVE *.r*.${date}* 
+         $REMOVE *.i.${date}*  
+         $REMOVE *prior_inflate_restart.${date}
+         $REMOVE *post_inflate_restart.${date}
+      endif
+   
+   else
+       echo $date : preserving files from this date
+   endif
+
+end
+
+# ==============================================================================
+# archive history files
+# ==============================================================================
+
+# Save most recent day's worth of history files, for potential continuing accumulation.
+# This assumes you are doing 6 hour assimilation cycles.
+set times = (00000 21600 43200 64800)
+foreach t ($times)
+   set most_recent = `ls -1t *cpl*.r*-$t* | head -1`
+   if ($status != 0) then
+      # Move on to the next iteration of the t loop.
+      continue
+   endif
+
+   set most_recent_date =  `echo $most_recent | sed "s/\.nc//; s/^.*\.r\.//;"`
+   echo THE MOST RECENT DATE, $most_recent_date, HISTORY FILES WILL be saved 
+   
+   if ( -e TEMP ) then
+      echo TEMP directory already exists to hold current history files
+   else
+      echo creating TEMP directory to hold current history files
+      mkdir TEMP 
+   endif
+
+   #move current history files into this directory
+   $MOVE *cam*.h*.${most_recent_date}*   TEMP
+   $MOVE *pop*.h*.${most_recent_date}*   TEMP
+   $MOVE *pop*.d*.${most_recent_date}*   TEMP
+   $MOVE *rtm*.h*.${most_recent_date}*   TEMP
+   $MOVE *clm2*.h*.${most_recent_date}*  TEMP
+   $MOVE *cice*.h*.${most_recent_date}*  TEMP
+   # Move CAM-SE grid files out of the way
+   $MOVE *Mapping*.nc                    TEMP
+end
+
+# Now archive all other history files.
+# All times (except for those hidden in TEMP) will be moved to the archive directory.
+# Excluding history files based on times requires extra code here.
+
+# Each of these entries will have * prepended and appended to it.
+set files = ('cpl.log.'       'cesm.log.'      'dart_log.'      'P*Diag'       'True'            \
+             'obs_seq'        '$CASE.cam*.h'   'atm_0001*.log.' '$CASE.clm*.h' 'lnd_0001*.log.'  \
+             'ice_0001*.log.' 'atm.log.'       'lnd.log.')
+
+#  These are parallel lists; entries here must correspond exactly to the file list directly above.
+set dests = ('cpl/logs'       'cpl/logs'       'dart/logs'      'dart/hist'      'dart/hist' \
+             'dart/hist'      'atm/hist'       'atm/logs'       'lnd/hist'       'lnd/logs'  \
+             'ice/logs'       'atm/logs'       'lnd/logs')
+
+
+if ($#files != $#dests) then
+   echo "Wordlists 'files' and 'dests' must have the same number of words in them"
+   exit 89
+endif
+
+# Make copies of the obs_seq.final files in an unarchived place,
+# to make obs space diagnostics easier.
+set o_s_finals = $RUNDIR:h/Obs_seqs
+if (! -d ${o_s_finals}) mkdir ${o_s_finals}
+
+set f = 1
+while ($f <= $#files)
+   set file_set = "*$files[$f]*"
+   ls $file_set >& /dev/null 
+   if ($status != 0) then
+      echo "Finished with all files matching $files[$f]"
+   else
+      if ("$files[$f]" == 'obs_seq' ) then
+         $COPY $file_set ${o_s_finals}
+      endif
+      
+      if (! -d $DOUT_S_ROOT/$dests[$f] ) then
+         echo "Making $DOUT_S_ROOT/$dests[$f] " 
+         mkdir -p $DOUT_S_ROOT/$dests[$f]
+      endif
+
+      $MOVE $file_set $DOUT_S_ROOT/$dests[$f]
+   endif 
+   @ f++
+end
+
+# Remove all log files that haven't been archived.
+
+$REMOVE *log*
+
+
+# move the history files in TEMP back into RUNDIR:
+$MOVE TEMP/* .
+
+# ==============================================================================
+# run the long term archiver if requested
+# ==============================================================================
+
+if ($DOUT_L_MS == 'TRUE') then
+   cd $DOUT_S_ROOT
+   $CASEROOT/Tools/lt_archive.sh -m copy_dirs_hsi
+endif
+
+exit 0
+
+# <next few lines under version control, do not edit>
+# $URL$
+# $Revision$
+# $Date$
+

Copied: DART/trunk/models/cam/shell_scripts/submit_cycles.csh (from rev 7729, DART/branches/cam/models/cam/shell_scripts/submit_cycles.csh)
===================================================================
--- DART/trunk/models/cam/shell_scripts/submit_cycles.csh	                        (rev 0)
+++ DART/trunk/models/cam/shell_scripts/submit_cycles.csh	2015-03-16 20:54:24 UTC (rev 7735)
@@ -0,0 +1,111 @@
+#!/bin/csh -f
+
+# DART software - Copyright 2004 - 2013 UCAR. This open source software is
+# provided by UCAR, "as is", without charge, subject to all terms of use at
+# http://www.image.ucar.edu/DAReS/DART/DART_download
+#
+# DART $Id$
+
+# This interactive script builds and submits a series of dependent jobs that runs 
+# a multi-cycle DART experiment where multiple model advances and assimilations
+# are done in a single batch job, and the 'job dependency' feature of LSF is used 
+# to sequence multiple batch jobs. 
+#
+# Running this way is a bit more complicated than the basic scripts but 
+# for a long experiment will be much cheaper and should have faster throughput.
+# See the README file for more details on how to use these scripts.
+#
+# This script is intended to work only with the LSF batch system.  It would
+# need to be heavily adapted for other batch systems.
+
+# (to actually use this script, remove the following lines once you
+# have reviewed how these scripts work.)
+echo "HEY!  This is a script intended only for advanced users "
+echo "      on NCAR's yellowstone computer."
+echo "      Please consult us before using it, since it does not have the level of "
+echo "      error checking and commenting which other DART scripts have. "
+exit 218                                                                            
+ 
+# ==============================================================================
+
+# Run this from the CESM $CASE directory
+
+# Each 'job' runs RESUBMIT+1 forecasts+assimilation cycles.
+#    This is defined for CESM by the RESUBMIT variable in env_run.xml,
+#    and is set in this script, and then reset by each job at its beginning.
+set RESUBMIT = 9
+
+# 'jobs_loop' jobs can be run before the $RUNDIR disk fills up
+#    and the output must be archived/purged.  
+set jobs_loop = 1
+
+# Long assimilations may require several archivings.  
+#    This is defined by archive_loop.
+set archive_loop = 1
+
+@ cycles = ($RESUBMIT + 1) * $jobs_loop * $archive_loop
+echo "The total number of forecasts submitted by this script is $cycles"
+
+# Set RESUBMIT for the first job, to make $CASE.run_cycles start with the 
+# right number of forecasts.  It will take care of itself after this.
+./xmlchange RESUBMIT=$RESUBMIT
+./xmlchange CONTINUE_RUN=TRUE
+
+source ./Tools/ccsm_getenv  || exit 14
+echo RESUBMIT is now $RESUBMIT
+
+# First job has no dependency built in.
+setenv BATCHSUBMIT_DEP `echo $BATCHSUBMIT`
+
+# Loop over the number of archivings which will be required.
+set a_l = 1
+while ($a_l <= $archive_loop)
+   echo "archive loop $a_l"
+
+   # Loop over the number of jobs which fill up the scratch space,
+   # and necessitate archiving.
+   # Each job will start with the RESUBMIT value set in this script.
+   set j_l = 1
+   while ($j_l <= $jobs_loop)
+      echo "   job loop $j_l"
+      echo "      ${BATCHSUBMIT_DEP} ./${CASE}.run_cycles" 
+
+      echo "${BATCHSUBMIT_DEP} ./${CASE}.run_cycles" >! templar
+      source templar >! out
+
+      set i = `grep Job out`
+      set jobid = `echo $i | cut -d'<' -f2 | cut -d'>' -f1`
+
+      # The next job will start if the previous job state is DONE (not EXIT).
+      setenv BATCHSUBMIT_DEP  `echo 'bsub -w "done(' $jobid ')" <'`
+      @ j_l++
+   end
+
+   # ONLY submit the post run cleanup, archive and long_term archive
+   # if completion of the last job (all of its RESUBMITs) is successful.
+
+   setenv BATCHSUBMIT_DEP  `echo 'bsub -w "done(' $jobid ')" <'`
+   echo "   ${BATCHSUBMIT_DEP} ./archive_cycles.csh"  
+
+   echo "${BATCHSUBMIT_DEP} ./archive_cycles.csh"  >! templar
+   source templar >! out
+
+   set i = `grep Job out`
+   set jobid = `echo $i | cut -d'<' -f2 | cut -d'>' -f1`
+
+   # submit the next $CASE.run_cycles only if the current archive_cycles.csh is "done"
+   setenv BATCHSUBMIT_DEP  `echo 'bsub -w "done(' $jobid ')" <'`
+
+   @ a_l++
+end
+
+if ($status == 0) then
+   rm templar out
+endif
+
+exit 0
+
+# <next few lines under version control, do not edit>
+# $URL$
+# $Revision$
+# $Date$


More information about the Dart-dev mailing list