[Dart-dev] [3819] DART/trunk/shell_scripts: Remove obsolete shell scripts from the top level directory,

nancy at ucar.edu nancy at ucar.edu
Thu Apr 16 10:46:27 MDT 2009


An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/dart-dev/attachments/20090416/574dfbb6/attachment-0001.html 
-------------- next part --------------
Deleted: DART/trunk/shell_scripts/advance_ens.csh
===================================================================
--- DART/trunk/shell_scripts/advance_ens.csh	2009-04-16 16:42:53 UTC (rev 3818)
+++ DART/trunk/shell_scripts/advance_ens.csh	2009-04-16 16:46:27 UTC (rev 3819)
@@ -1,374 +0,0 @@
-#!/bin/csh
-#
-# Data Assimilation Research Testbed -- DART
-# Copyright 2004-2007, Data Assimilation Research Section
-# University Corporation for Atmospheric Research
-# Licensed under the GPL -- www.gpl.org/licenses/gpl.html
-#
-# <next few lines under version control, do not edit>
-# $URL$
-# $Id$
-# $Revision$
-# $Date$
-
-# DESCRIPTION:
-#
-# The strategy here is that this job has requested a bunch of processors
-# and has many model advances (ensemble members). We are going to assign 
-# a model advance to a processor. Since there are more model advances 
-# (more ensemble members) than processors, we will have to advance a bunch 
-# of ensemble members, wait for that to finish, then do another batch ... 
-# until we have advanced all the ensemble members. We are running each 
-# ensemble member using just one processor, even though some of the models 
-# might be able to take advantage of multiple processors.
-#
-# This script also has logic in it to check to see if the model advances
-# completed successfully. After all the members have been advanced once, 
-# a check is made to see if the desired output files exist. If they do not,
-# ONE attempt is made to gather up all the members that need to be rerun and 
-# give it one more try -- this time without using the processors that failed
-# in the first place. If that fails, we give up.
-#
-# 'Section 1' of filter_server.csh is functionally identical to this script.
-# For that reason, I have kept the same syntax and formatting.
-#
-# This script usually gets invoked when filter_nml namelist variables
-# async == 2
-# adv_ens_command = "./advance_ens.csh"
-#
-# If you want to take advantage of the queueing systems, 
-#
-# adv_ens_command = "bsub < advance_ens.csh". 
-#
-#
-# The flow outline: 
-#    'filter' is running in the CENTRALDIR
-#    'filter' creates a semaphor file called batchflag 
-#    'filter' fires off a shell command (advance_ens.csh) to advance the models. 
-#    'filter' then waits patiently for 'batchflag' to disappear.
-#     advance_ens.csh (this script) advances everyone.
-#     when advance_ens.csh completes, remove batchflag ... filter proceeds,
-#     presumably to the assimilation. 
-#
-##=============================================================================
-# The 'common' strategy between batch submission mechanisms is that the batch
-# directives are embedded as shell comments. This provides us with the hope
-# that we can use one script with several different batch mechanisms because
-# the directives for one are interpreted as comments by the other.
-# Both sets of directives are interpreted as comments when the script is 
-# invoked interactively.
-#
-# The number of processors (NPROCS) is determined by one of two things.
-# If this script is run in batch mode, NPROCS is defined by the batch 
-# submission directives.  Run interactively, NPROCS will be 2 ... and
-# multiple processes will get run on the same processor/host.
-#
-##=============================================================================
-## This block of directives constitutes the preamble for the LSF queuing system 
-## LSF is used on the IBM   Linux cluster 'lightning'
-## LSF is used on the IMAGe Linux cluster 'coral'
-## LSF is used on the IBM   cluster 'bluevista'
-## The queues on lightning and bluevista are supposed to be similar.
-##
-## the normal way to submit to the queue is:    bsub < advance_ens_LSF_PBS.csh
-##
-## an explanation of the most common directives follows:
-## -J Job name
-## -o Output files
-## -q queue    cheapest == [standby, economy, (regular,debug), premium] == $$$$
-##=============================================================================
-#BSUB -J advance_ens
-#BSUB -o adv_ens.%J.log
-#BSUB -q regular
-#BSUB -n 8
-
-##=============================================================================
-## This block of directives constitutes the preamble for the PBS queuing system 
-## PBS is used on the CGD   Linux cluster 'bangkok'
-## PBS is used on the CGD   Linux cluster 'calgary'
-##
-## the normal way to submit to the queue is:    qsub advance_ens_LSF_PBS.csh
-##
-## an explanation of the most common directives follows:
-## -N     Job name
-## -r n   Declare job non-rerunable
-## -e <arg>  filename for standard error 
-## -o <arg>  filename for standard out 
-## -q <arg>   Queue name (small, medium, long, verylong)
-## -l nodes=xx:ppn=2   requests BOTH processors on the node. On both bangkok 
-##                     and calgary, there is no way to 'share' the processors 
-##                     on the node with another job, so you might as well use 
-##                     them both.  (ppn == Processors Per Node)
-##=============================================================================
-#PBS -N advance_ens
-#PBS -r n
-#PBS -e advance_ens.err
-#PBS -o advance_ens.log
-#PBS -q medium
-#PBS -l nodes=4:ppn=2
-
-# A common strategy for the beginning is to check for the existence of
-# some variables that get set by the different queuing mechanisms.
-# This way, we know which queuing mechanism we are working with,
-# and can set 'queue-independent' variables for use for the remainder 
-# of the script.
-
-if ($?LS_SUBCWD) then
-
-   # LSF has a list of processors already in a variable (LSB_HOSTS)
-
-   set CENTRALDIR = $LS_SUBCWD
-   set JOBNAME = $LSB_JOBNAME
-   set PROCNAMES = ($LSB_HOSTS)
-   set REMOTECMD = ssh
-   set SCRATCHDIR = /ptmp/${user}
-
-else if ($?PBS_O_WORKDIR) then
-
-   # PBS has a list of processors in a file whose name is (PBS_NODEFILE)
-
-   set CENTRALDIR = $PBS_O_WORKDIR
-   set JOBNAME = $PBS_JOBNAME
-   set PROCNAMES = `cat $PBS_NODEFILE`
-   set REMOTECMD = rsh
-   set SCRATCHDIR = /scratch/local/${user}
-
-else if ($?OCOTILLO_NODEFILE) then
-
-   # ocotillo is a 'special case'. It is the only cluster I know of with
-   # no queueing system.  You must generate a list of processors in a 
-   # file whose name is in $OCOTILLO_NODEFILE.  For example ... 
-   # setenv OCOTILLO_NODEFILE  my_favorite_processors
-   # echo "node1"  > $OCOTILLO_NODEFILE
-   # echo "node5" >> $OCOTILLO_NODEFILE
-   # echo "node7" >> $OCOTILLO_NODEFILE
-   # echo "node3" >> $OCOTILLO_NODEFILE
-
-   set CENTRALDIR = `pwd`
-   set JOBNAME = advance_ens
-   set PROCNAMES = `cat $OCOTILLO_NODEFILE`
-   set REMOTECMD = rsh
-   set SCRATCHDIR = /var/tmp/${user}
-
-else                                    # interactive
-
-   if ( ! $?host) then
-      setenv host `uname -n`
-   endif
-
-   set CENTRALDIR = `pwd`
-   set JOBNAME = interactive_advance_ens
-   set PROCNAMES = "$host $host $host $host"
-   set REMOTECMD = csh
-   set SCRATCHDIR = `pwd`
-
-endif
-
-if ( ! $?REMOVE ) then
-  set REMOVE = 'rm -rf'
-endif
-
-# This job's working directory; must cd to it, or it may run in /home...
-
-cd $CENTRALDIR
-
-set NPROCS = `echo $PROCNAMES | wc -w`
-
-
-# Output to confirm job characteristics
-echo " "
-echo "Running $JOBNAME on host "`hostname`
-echo "Time is "`date`
-echo "(central) directory is $CENTRALDIR"
-echo "This job has allocated $NPROCS processors."
-echo "They are:"
-echo $PROCNAMES
-echo " "
-
-# Set a variable for the semaphor file that indicates the ensemble advances
-# have completed. This MUST be named $CENTRALDIR/batchflag ... so don't change it. 
-# We can log some run-time stuff to the file with no fear. If this does not
-# terminate normally, the batchflag file might contain useful information.
-# If this script terminates normally, the batchflag is removed -- to indicate
-# to the rest of dart that it is done.
-#------------------------------------------------------------------------------
-# This block is exactly the same as 'section 1' of filter_server.csh, with only
-# one exception. There is no dedicated logfile for this block, so we just tack
-# information onto the semaphor file 'batchflag'. If this dies prematurely,
-# batchflag still exists and might contain something useful.
-#------------------------------------------------------------------------------
-
-set MASTERLOG = ${CENTRALDIR}/batchflag
-
-      # First line of filter_control should have number of model states to be integrated
-      if (-e filter_control) then
-         set nensemble = `head -1 filter_control`
-      else
-         set nensemble = 1
-      endif
-      echo "$JOBNAME - advancing $nensemble members at " `date`  >> $MASTERLOG
-      echo "$JOBNAME - advancing $nensemble members at " `date`
-
-      # figure # batches of model runs to do, from # ensemble members and # processors
-      @ nbatch = $nensemble / $NPROCS
-      if ($nensemble % $NPROCS != 0 ) @ nbatch++
-
-      # Create a directory for each member to run in for namelists
-      set element = 0
-      set batch = 1
-      while ($batch <= $nbatch)
-
-         # Advance the model for each ensemble member
-         # advance_model has an additional optional arg for mpi "machines"
-         foreach proc ( $PROCNAMES )
-            @ element++
-            if ($element > $nensemble) goto some_elements_done
-
-            set ELEMENTDIR = $SCRATCHDIR/member_$element 
-
-            if ($REMOTECMD == 'csh') then  # interactive
-               set proc = " "
-            endif
-
-            echo "$REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $element $ELEMENTDIR"
-            echo "$REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $element $ELEMENTDIR" >> $MASTERLOG
-                  $REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $element $ELEMENTDIR &
-         end
-
-         echo "$JOBNAME - waiting to finish ensemble batch $batch of $nbatch at "`date`  >> $MASTERLOG
-         echo "$JOBNAME - waiting to finish ensemble batch $batch of $nbatch at "`date`
-         wait
-
-         @ batch++
-      end
-      some_elements_done:
-
-      # Make sure all processes complete before continuing
-      wait
-
-      # Count all the ?full-size? assim_model_state* files to see if there
-      # are any model advances that need to be rerun. If the file is zero
-      # length, or not present -- we assume the model advance failed for some reason.
-      # We will try again ... once ... and give up if that does not work.
-
-      echo "$JOBNAME - Entering advance_model rerun block at "`date`
-      echo "$JOBNAME - Entering advance_model rerun block at "`date` >> $MASTERLOG
-      echo " " >> $MASTERLOG
-
-      set n = 0
-      set nrerun = 0
-      while ($n < $nensemble)
-         @ n++
-
-         ls -l assim_model_state_ud$n
-
-         if (-z assim_model_state_ud$n || ! -e assim_model_state_ud$n) then
-            @ nrerun++
-            @ procnum = $n % $NPROCS
-
-            echo "$JOBNAME - advance_model - failed procnum is $procnum"
-            echo "$JOBNAME - advance_model - failed procnum is $procnum" >> $MASTERLOG
-
-            if ($procnum == 0) set procnum = $NPROCS
-
-            if ($nrerun == 1) then
-               set rerun = ($n)
-               set badprocs = ($PROCNAMES[$procnum])
-            else
-               set rerun = ($rerun $n)
-               set badprocs = ($badprocs $PROCNAMES[$procnum])
-            endif
-            echo "$JOBNAME - nrerun, rerun = $nrerun $rerun on $PROCNAMES[$procnum]" >> $MASTERLOG
-            echo "$JOBNAME - nrerun, rerun = $nrerun $rerun on $PROCNAMES[$procnum]"
-         else
-            echo "$JOBNAME - assim_model_state_ud$n is fine" >> $MASTERLOG
-            echo "$JOBNAME - assim_model_state_ud$n is fine"
-         endif
-      end
-
-      # If there were some bad model advances, we must find the processors
-      # that worked, and log ones that didn't.  We will resubmit to just the
-      # list of good processors.
-      
-      if ($nrerun > 0) then
-         (echo "$badprocs" | mail -s "node(s) died" $user at ucar.edu)  &
-
-         set ngood = 0
-         set goodprocs = ' '
-         foreach proc ($PROCNAMES)
-            set is_good = 'true'
-            foreach bad ($badprocs)
-               if ($proc == $bad) set is_good = 'false'
-            end
-            if ($is_good == 'true') then
-               @ ngood++
-               if ($ngood == 1 ) then
-                  set goodprocs = ($proc)
-               else
-                  set goodprocs = ($goodprocs $proc)
-               endif
-            endif
-         end
-         echo ' '                                         >> $MASTERLOG
-         echo "Ensemble; nrerun rerun = $nrerun $rerun"   >> $MASTERLOG
-         echo "       good processors = $goodprocs"       >> $MASTERLOG
-         echo ' '                                         >> $MASTERLOG
-      endif
-
-      # The number of batches and nodes limit how many bad nodes can be handled; 
-      # 4 bad processors (=2 bad nodes) x 4 batches = 16 members to redo on 16 procs
-      # Give up if the number to redo is greater than the number of processors.
-
-      if ($nrerun > 0) then
-         set element = 0
-         foreach proc ( $goodprocs )
-            @ element++
-            if ($element > $nrerun) goto all_elements_done
-
-            set ELEMENTDIR = $SCRATCHDIR/member_$rerun[$element] 
-
-            if ($REMOTECMD == 'csh') then # interactive
-               set proc = " "
-            endif
-
-            echo "$REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR" >> $MASTERLOG
-            echo "$REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR"
-                  $REMOTECMD $proc  $CENTRALDIR/advance_model.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR &
-
-         end
-         all_elements_done:
-
-         echo "$JOBNAME - waiting to finish ensemble rerun "`date`  >> $MASTERLOG
-         echo "$JOBNAME - waiting to finish ensemble rerun "`date`
-         wait
-      endif
-
-      # OK, we tried. If the ones we needed to rerun are not done now ... give up.
-
-      set n = 0
-      while ($n < $nensemble)
-         @ n++
-         if (-z assim_model_state_ud$n || ! -e assim_model_state_ud$n) then
-            echo "MISSING assim_model_state_ud$n and aborting" >> $MASTERLOG
-            echo "MISSING assim_model_state_ud$n and aborting" >> $MASTERLOG
-            echo "MISSING assim_model_state_ud$n and aborting"
-            echo "MISSING assim_model_state_ud$n and aborting"
-            exit 1
-         endif
-      end
-
-#------------------------------------------------------------------------------
-# At this point, we differ from the async = 3 scenario of filter_server.csh.
-# filter_server.csh communicates by removing a file 'go_advance_model'.
-#
-# This script is invoked when (async = 2) and signals that it has completed
-# by removing the semaphor file '${CENTRALDIR}/batchflag'
-#------------------------------------------------------------------------------
-
-echo "$JOBNAME - Completed this advance at " `date`
-echo "$JOBNAME - ---------"
-pwd
-ls -lt assim_model_state_ud*
-
-# signal ensemble_manager_mod:Aadvance_state() (if async=2) to continue
-${REMOVE} ${CENTRALDIR}/batchflag

Deleted: DART/trunk/shell_scripts/advance_ens_LSF.csh
===================================================================
--- DART/trunk/shell_scripts/advance_ens_LSF.csh	2009-04-16 16:42:53 UTC (rev 3818)
+++ DART/trunk/shell_scripts/advance_ens_LSF.csh	2009-04-16 16:46:27 UTC (rev 3819)
@@ -1,161 +0,0 @@
-#!/bin/csh
-#
-# Data Assimilation Research Testbed -- DART
-# Copyright 2004-2007, Data Assimilation Research Section
-# University Corporation for Atmospheric Research
-# Licensed under the GPL -- www.gpl.org/licenses/gpl.html
-#
-# <next few lines under version control, do not edit>
-# $URL$
-# $Id$
-# $Revision$
-# $Date$
-
-# The number of processors (NPROCS) is determined by one of two things.
-# If this script is run interactively, NPROCS is unity.
-# If this script is run in batch mode, NPROCS is user-defined by  
-# the argument to 'bsub -n xxxx'
-#
-# Initial version to run on lightning IBM Linux cluster
-
-### Job name
-#BSUB -J advance_ens
-### Declare job non-rerunable (default behavior with BSUB?)
-
-### Output files
-#BSUB -o adv_ens.%J.log
-#BSUB -P 86850054
-### Queue charging    cheapest ..... most expensive   (at least on lightning)
-### Queue name (standby, economy, [regular,debug], premium)
-#BSUB -q regular
-#BSUB -n 28
-
-# First line of filter_control should have number of model states to be integrated
-set nensmbl = `head -1 filter_control`
-
-# Determine number of processors
-#
-# list of hosts/machines is in $PROCNAMES
-# the quoting is VERY IMPORTANT for PROCNAMES
-
-if ($?LSB_HOSTS) then                       ;# batch
-   set NPROCS = `echo $LSB_HOSTS | wc -w`
-#   set PROCNAMES = "$LSB_HOSTS"
-   set PROCNAMES = ($LSB_HOSTS)
-else                                        ;# interactive
-   set NPROCS = 1
-   set PROCNAMES = $host
-endif
-
-### This job's working directory; must cd to it, or it will run in /home...
-if ($?LS_SUBCWD) then
-   cd $LS_SUBCWD
-else
-   setenv LS_SUBCWD `pwd`
-endif
-
-### Output to confirm job characteristics
-if ($?LSB_JOBNAME) then
-   echo Running $LSB_JOBNAME on host `hostname`
-else
-   echo "Running on host "`hostname`
-endif
-echo Time is `date`
-echo Directory is `pwd`
-echo This job runs on the following nodes:
-echo $PROCNAMES
-
-echo This job has allocated $NPROCS processors
-
-# figure # batches of runs to do, from # ensemble members and # processors
-@ nbatch = $nensmbl / $NPROCS
-if ($nensmbl % $NPROCS != 0 ) @ nbatch++
-echo $nbatch batches will be executed
-
-# Send jobs to nodes
-set element = 0
-set batch = 1
-while($batch <= $nbatch)
-   foreach proc ( $PROCNAMES )
-      @ element++
-      if ($element > $nensmbl) goto all_elements_done
-
-      ssh $proc "csh $LS_SUBCWD/advance_model.csh $LS_SUBCWD $element /ptmp/${user}/tmp$user$element " &
-
-   end
-# Another way to monitor progress.  batchflag has other info to start,
-# so this echo can be removed and scripts will still work.
-   echo waiting to finish batch $batch of $nbatch >> $LS_SUBCWD/batchflag
-   wait
-   @ batch++
-end
-all_elements_done:
-
-# Wait for all *background* processes to finish up
-wait
-
-# Attempt to rerun members that did not advance successfully.
-set rerun = ' '
-set nrerun = 0
-set goodprocs = ' '
-set badprocs = ' '
-set element = 0
-set batch = 1
-set NPROCS = 0
-while($batch <= $nbatch)
-   set iproc = 1
-   foreach proc ( $PROCNAMES )
-      @ element++
-      if ($element > $nensmbl) goto all_elements_checked
-      set iblo = `expr $element \+ 10000`
-      set iblo = `echo $iblo | cut -c2-5`
-      set blown = `grep $iblo blown_*.out | cat | wc -l`
-      if ($blown == 0 && -e $LS_SUBCWD/assim_model_state_ud$element) then
-#      if ($blown == 0 && -e /ptmp/${user}/tmp${user}${element}/dart_wrf_vector) then
-         set goodprocs = ($goodprocs $PROCNAMES[$iproc])
-         @ NPROCS++
-      else
-         set badprocs = ($badprocs $PROCNAMES[$iproc])
-         set rerun = ($rerun $element)
-         @ nrerun++
-      endif
-      @ iproc++
-   end
-   @ batch++
-end
-all_elements_checked:
-
-rm -f blown_*.out
-
-if ($nrerun > 0) then
-   echo $nrerun members will be rerun
-   @ nbatch = $nrerun / $NPROCS
-   if ($nrerun % $NPROCS != 0 ) @ nbatch++
-   echo $nbatch batches will be executed
-   set element = 0
-   set batch = 1
-   while($batch <= $nbatch)
-      foreach proc ( $goodprocs )
-         @ element++
-         if ($element > $nrerun) goto all_elements_rerun
-
-         ssh $proc "csh $LS_SUBCWD/advance_model.csh $LS_SUBCWD $rerun[$element] /ptmp/${user}/tmp${user}$rerun[$element] " &
-
-      end
-# Another way to monitor progress.  batchflag has other info to start,
-# so this echo can be removed and scripts will still work.
-      echo waiting to finish rerun batch $batch of $nbatch >> $LS_SUBCWD/batchflag
-      wait
-      @ batch++
-   end
-all_elements_rerun:
-endif
-
-# Wait for all *background* processes to finish up
-wait
-
-mkdir -p FAILURES
-mv blown_*.out FAILURES/
-
-# signal to async_filter.csh (if async=1) or to Aadvance_state (if async=2) to continue
-rm -f $LS_SUBCWD/batchflag

Deleted: DART/trunk/shell_scripts/assim_filter.csh
===================================================================
--- DART/trunk/shell_scripts/assim_filter.csh	2009-04-16 16:42:53 UTC (rev 3818)
+++ DART/trunk/shell_scripts/assim_filter.csh	2009-04-16 16:46:27 UTC (rev 3819)
@@ -1,329 +0,0 @@
-#!/bin/csh
-#
-# Data Assimilation Research Testbed -- DART
-# Copyright 2004-2007, Data Assimilation Research Section
-# University Corporation for Atmospheric Research
-# Licensed under the GPL -- www.gpl.org/licenses/gpl.html
-#
-# <next few lines under version control, do not edit>
-# $URL$
-# $Id$
-# $Revision$
-# $Date$
-
-# DESCRIPTION:
-#
-#
-#
-#
-#
-#
-#
-# IF YOU READ ONLY ONE COMMENT -- READ THIS ONE.
-# This script is designed to be as generic as possible - and still work.
-# This script exploits the fact that different queueing systems set
-# environment variables. If these variables exist, the values are passed
-# to generic equivalents so that the vast majority of the script stays the
-# same, i.e, we are not dragging around queuing-system-specific variables
-# any longer than necessary.  TJH 13 Dec 2005
-
-##=============================================================================
-## This block of directives constitutes the preamble for the LSF queuing system 
-## LSF is used on the IBM   Linux cluster 'lightning'
-## LSF is used on the IMAGe Linux cluster 'coral'
-## LSF is used on the IBM   cluster 'bluevista'
-## The queues on lightning and bluevista are supposed to be similar.
-##
-## the normal way to submit to the queue is:    bsub < assim_filter.csh
-##
-## an explanation of the most common directives follows:
-## -J Job name
-## -o Output files
-## -q queue    cheapest == [standby, economy, (regular,debug), premium] == $$$$
-##=============================================================================
-#BSUB -J assim_filter
-#BSUB -o assim_filter.%J.o
-#BSUB -q regular
-#BSUB -n 8
-
-##=============================================================================
-## This block of directives constitutes the preamble for the PBS queuing system 
-## PBS is used on the CGD   Linux cluster 'bangkok'
-## PBS is used on the CGD   Linux cluster 'calgary'
-##
-## the normal way to submit to the queue is:    qsub assim_filter.csh
-##
-## an explanation of the most common directives follows:
-## -N     Job name
-## -r n   Declare job non-rerunable
-## -e <arg>  filename for standard error 
-## -o <arg>  filename for standard out 
-## -q <arg>   Queue name (small, medium, long, verylong)
-## -l nodes=xx:ppn=2   requests BOTH processors on the node. On both bangkok 
-##                     and calgary, there is no way to 'share' the processors 
-##                     on the node with another job, so you might as well use 
-##                     them both.  (ppn == Processors Per Node)
-##=============================================================================
-#PBS -N assim_filter
-#PBS -r n
-#PBS -e assim_filter.err
-#PBS -o assim_filter.log
-#PBS -q medium
-#PBS -l nodes=4:ppn=2
-
-# A common strategy for the beginning is to check for the existence of
-# some variables that get set by the different queuing mechanisms.
-# This way, we know which queuing mechanism we are working with,
-# and can set 'queue-independent' variables for use for the remainder 
-# of the script.
-
-if ($?LS_SUBCWD) then
-
-   # LSF has a list of processors already in a variable (LSB_HOSTS)
-
-   set CENTRALDIR = $LS_SUBCWD
-   set JOBNAME = $LSB_JOBNAME
-   set PROCNAMES = ($LSB_HOSTS)
-   set REMOTECMD = ssh
-   set SCRATCHDIR = /ptmp/${user}
-
-else if ($?PBS_O_WORKDIR) then
-
-   # PBS has a list of processors in a file whose name is (PBS_NODEFILE)
-
-   set CENTRALDIR = $PBS_O_WORKDIR
-   set JOBNAME = $PBS_JOBNAME
-   set PROCNAMES = `cat $PBS_NODEFILE`
-   set REMOTECMD = rsh
-   set SCRATCHDIR = /scratch/local/${user}
-
-else if ($?OCOTILLO_NODEFILE) then
-
-   # ocotillo is a 'special case'. It is the only cluster I know of with
-   # no queueing system.  You must generate a list of processors in a 
-   # file whose name is in $OCOTILLO_NODEFILE.  For example ... 
-   # setenv OCOTILLO_NODEFILE  my_favorite_processors
-   # echo "node1"  > $OCOTILLO_NODEFILE
-   # echo "node5" >> $OCOTILLO_NODEFILE
-   # echo "node7" >> $OCOTILLO_NODEFILE
-   # echo "node3" >> $OCOTILLO_NODEFILE
-
-   set CENTRALDIR = `pwd`
-   set JOBNAME = assim_filter
-   set PROCNAMES = `cat $OCOTILLO_NODEFILE`
-   set REMOTECMD = rsh
-   set SCRATCHDIR = /var/tmp/${user}
-
-else                                    # interactive
-
-   if ( ! $?host) then
-      setenv host `uname -n`
-   endif
-
-   set CENTRALDIR = `pwd`
-   set JOBNAME = interactive_assim_filter
-   set PROCNAMES = "$host $host $host $host"
-   set REMOTECMD = csh
-   set SCRATCHDIR = `pwd`
-
-endif
-
-if ( ! $?REMOVE ) then
-  set REMOVE = 'rm -rf'
-endif
-
-# This job's working directory; must cd to it, or it may run in /home...
-
-cd $CENTRALDIR
-
-set NPROCS = `echo $PROCNAMES | wc -w`
-
-
-# Output to confirm job characteristics
-echo " "
-echo "Running $JOBNAME on host "`hostname`
-echo "Time is "`date`
-echo "(central) directory is $CENTRALDIR"
-echo "This job has allocated $NPROCS processors."
-echo "They are:"
-echo $PROCNAMES
-echo " "
-
-#------------------------------------------------------------------------------
-# This block is exactly the same as 'section 2' of filter_server.csh, with only
-# one exception. There is no dedicated logfile for this block, so we just tack
-# information onto the semaphor file 'batchflag'. If this dies prematurely,
-# batchflag still exists and might contain something useful.
-#------------------------------------------------------------------------------
-
-set MASTERLOG = ${CENTRALDIR}/batchflag
-
-      # First line of assim_region_control is the number of regions to be assimilated
-      if ( -e assim_region_control) then
-         set nregions = `head -1 assim_region_control`
-      else
-         set nregions = 1
-      endif
-      echo "$JOBNAME - assimilating $nregions regions at " `date` >> $MASTERLOG
-      echo "$JOBNAME - assimilating $nregions regions at " `date`
-
-      # figure # batches of model runs to do, from # regions and # processors
-      @ nbatch = $nregions / $NPROCS
-      if ($nregions % $NPROCS != 0 ) @ nbatch++
-
-      # Create a directory for each member to run in for namelists
-      set element = 0
-      set batch = 1
-      while ($batch <= $nbatch)
-         foreach proc ( $PROCNAMES )
-            @ element++
-            if ($element > $nregions) goto some_regions_done
-
-            set ELEMENTDIR = $SCRATCHDIR/region_$element 
-
-            if ($REMOTECMD == 'csh') then  # interactive
-               set proc = " "
-            endif
-
-      echo "$REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $element $ELEMENTDIR" >> $MASTERLOG
-      echo "$REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $element $ELEMENTDIR"
-            $REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $element $ELEMENTDIR &
-
-         end
-
-         echo "$JOBNAME - waiting to finish regional batch $batch of $nbatch " `date`   >> $MASTERLOG
-         echo "$JOBNAME - waiting to finish regional batch $batch of $nbatch " `date`
-         wait
-
-         @ batch++
-      end
-      some_regions_done:
-
-      # Make sure all (background) processes complete before continuing
-      wait
-
-      # Count all the ?full-size? filter_assim_region_out* files to see if there
-      # are any regional assimilations that need to be rerun. If the file is zero
-      # length, or not present -- we assume an assimilation failed for some reason.
-      # We will try again ... once ... and give up if that does not work.
-
-      echo "$JOBNAME - Entering assim_region rerun block at "`date`
-      echo "$JOBNAME - Entering assim_region rerun block at "`date` >> $MASTERLOG
-      echo " " >> $MASTERLOG
-
-      set n = 0
-      set nrerun = 0
-      while ($n < $nregions)
-         @ n++
-
-         ls -l filter_assim_region_out$n
-
-         if (-z filter_assim_region_out$n || ! -e filter_assim_region_out$n) then
-            @ nrerun++
-            @ procnum = $n % $NPROCS
-
-            echo "$JOBNAME - assim_region - failed procnum is $procnum"
-            echo "$JOBNAME - assim_region - failed procnum is $procnum" >> $MASTERLOG
-
-            if ($procnum == 0) set procnum = $NPROCS
-
-            if ($nrerun == 1) then
-               set rerun = ($n)
-               set badprocs = ($PROCNAMES[$procnum])
-            else
-               set rerun = ($rerun $n)
-               set badprocs = ($badprocs $PROCNAMES[$procnum])
-            endif
-            echo "$JOBNAME - nrerun, rerun = $nrerun $rerun on $PROCNAMES[$procnum]" >> $MASTERLOG
-            echo "$JOBNAME - nrerun, rerun = $nrerun $rerun on $PROCNAMES[$procnum]"
-         else
-            echo "$JOBNAME - filter_assim_region_out$n is fine" >> $MASTERLOG
-            echo "$JOBNAME - filter_assim_region_out$n is fine"
-         endif
-      end
-
-      # If there were some bad regional assimilations, we must find the processors
-      # that worked, and log ones that didn't.  We will resubmit to just the
-      # list of good processors.
-
-      if ($nrerun > 0) then
-
-         set ngood = 0
-         set goodprocs = ' '
-         foreach proc ($PROCNAMES)
-            set is_good = 'true'
-            foreach bad ($badprocs)
-               if ($proc == $bad) set is_good = 'false'
-            end
-            if ($is_good == 'true') then
-               @ ngood++
-               if ($ngood == 1 ) then
-                  set goodprocs = ($proc)
-               else
-                  set goodprocs = ($goodprocs $proc)
-               endif
-            endif
-         end
-         echo ' '                                      >> $MASTERLOG
-         echo "Regions; nrerun rerun = $nrerun $rerun" >> $MASTERLOG
-         echo "      good processors = $goodprocs"     >> $MASTERLOG
-         echo ' '                                      >> $MASTERLOG
-      endif
-
-      # The number of batches and nodes limit how many bad nodes can be handled; 
-      # 4 bad processors (=2 bad nodes) x 4 batches = 16 members to redo on 16 procs
-      # Give up if the number to redo is greater than the number of processors.
-
-      if ($nrerun > 0) then
-         set element = 0
-         foreach proc ( $goodprocs )
-            @ element++
-            if ($element > $nrerun) goto all_regions_done
-
-            set ELEMENTDIR = $SCRATCHDIR/region_$rerun[$element] 
-
-            if ($REMOTECMD == 'csh') then # interactive
-               set proc = " "
-            endif
-
-            echo "$REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR"
-            echo "$REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR" >> $MASTERLOG
-                  $REMOTECMD $proc  $CENTRALDIR/assim_region.csh $CENTRALDIR $rerun[$element] $ELEMENTDIR  &
-
-         end
-         all_regions_done:
-
-         echo "$JOBNAME - waiting to finish regions rerun "`date`  >> $MASTERLOG
-         echo "$JOBNAME - waiting to finish regions rerun "`date`
-         wait
-      endif
-
-      # OK, we tried. If the ones we needed to rerun are not done now ... give up.
-
-      set n = 0
-      while ($n < $nregions)
-         @ n++
-         if (-z filter_assim_region_out$n || ! -e filter_assim_region_out$n) then
-            echo "MISSING filter_assim_region_out$n and aborting" >> $MASTERLOG
-            echo "MISSING filter_assim_region_out$n and aborting" >> $MASTERLOG
-            echo "MISSING filter_assim_region_out$n and aborting"
-            echo "MISSING filter_assim_region_out$n and aborting"
-            exit 2
-         endif
-      end
-
-#------------------------------------------------------------------------------
-# At this point, we differ from the async = 3 scenario of filter_server.csh.
-# filter_server.csh communicates by removing a file 'go_assim_regions'.
-#
-# This script is invoked when (async = 2) and signals that it has completed
-# by removing the semaphor file '${CENTRALDIR}/batchflag'
-#------------------------------------------------------------------------------
-
-echo "$JOBNAME - Completed this assimilation at " `date`
-echo "$JOBNAME - ---------"
-pwd
-ls -lt filter_assim_region_out*
-
-# signal to assim_tools_mod:filter_assim() (if async==2) to continue
-${REMOVE} ${CENTRALDIR}/batchflag

Deleted: DART/trunk/shell_scripts/assim_filter_LSF.csh
===================================================================
--- DART/trunk/shell_scripts/assim_filter_LSF.csh	2009-04-16 16:42:53 UTC (rev 3818)
+++ DART/trunk/shell_scripts/assim_filter_LSF.csh	2009-04-16 16:46:27 UTC (rev 3819)
@@ -1,77 +0,0 @@
-#!/bin/csh
-#
-# Data Assimilation Research Testbed -- DART
-# Copyright 2004-2007, Data Assimilation Research Section
-# University Corporation for Atmospheric Research
-# Licensed under the GPL -- www.gpl.org/licenses/gpl.html
-#
-# <next few lines under version control, do not edit>
-# $URL$
-# $Id$
-# $Revision$
-# $Date$
-
-# Initial version to run on lightning IBM Linux cluster
-
-### Job name
-#BSUB -J assim_filter
-### Declare job non-rerunable (default behavior with BSUB?)
-
-### Output files
-#BSUB -o assim_filter.%J.o
-### Queue name (economy, regular, premium)
-#BSUB -q regular
-set NPROCS = 9
-#BSUB -n 9
-
-### This job's working directory; must cd to it, or it will run in /home...
-if ($?LS_SUBCWD) then
-   cd $LS_SUBCWD
-else
-   setenv LS_SUBCWD `pwd`
-endif
-
-### Output to confirm job characteristics
-if ($?LSB_JOBNAME) then
-   echo Running $LSB_JOBNAME on host `hostname`
-else
-   echo "Running on host "`hostname`
-endif
-echo Time is `date`
-echo Directory is `pwd`
-echo This job runs on the following nodes:
-echo $LSB_HOSTS
-echo This job has allocated $NPROCS processors
-
-# First line of assim_region_control should have number of regions to be assimilated
-set nregions = `head -1 assim_region_control`
-
-# figure # batches of runs to do, from # regions and # processors
-@ nbatch = $nregions / $NPROCS
-if ($nregions % $NPROCS != 0 ) @ nbatch++
-echo $nbatch batches will be executed
-
-# Send jobs to nodes
-set element = 0
-set batch = 1
-while($batch <= $nbatch)
-   foreach node ( $LSB_HOSTS )
-      @ element++
-      if ($element > $nregions) goto all_elements_done
-
-      ssh $node "csh $LS_SUBCWD/assim_region.csh $LS_SUBCWD $element /ptmp/${user}/tmp$user$element " &
-
-   end
-# Another way to monitor progress.  batchflag has other info to start,
-# so this echo can be removed and scripts will still work.
-   echo waiting to finish batch $batch  >> $LS_SUBCWD/batchflag
-   wait
-   @ batch++
-end
-all_elements_done:
-
-# Wait for all *background* processes to finish up
-wait
-
-# signal to filter_assim to continue
-rm -f $LS_SUBCWD/batchflag


More information about the Dart-dev mailing list