[Dart-dev] [7002] DART/trunk/obs_sequence: change the 'list' option to require N separate lists

Thu May 22 10:17:32 MDT 2014

Revision: 7002
Author:   nancy
Date:     2014-05-22 10:17:32 -0600 (Thu, 22 May 2014)
Log Message:
-----------
change the 'list' option to require N separate lists
if you are going to compare more than a single set
of obs_seq.final files.  each list contains the filenames
for each experiment.  this is easier for the user to
generate and less confusing all around.  fixes issue DART-30.

also support a new namelist item: eval_and_assim_can_match
which defaults to false for backwards compatibility
but can allow obs which are sucessfully assimilated in
one experiment and sucessfully evaluated in another to 
match.  the implementation is 0 and 1 match, and 2 and 3 match
(assuming the dart_qc_threshold is at the default value
of 3).  fixes issue DART-31.

Modified Paths:
--------------
    DART/trunk/obs_sequence/obs_common_subset.f90
    DART/trunk/obs_sequence/obs_common_subset.html
    DART/trunk/obs_sequence/obs_common_subset.nml

-------------- next part --------------
Modified: DART/trunk/obs_sequence/obs_common_subset.f90
===================================================================

--- DART/trunk/obs_sequence/obs_common_subset.f90	2014-05-21 23:15:14 UTC (rev 7001)
+++ DART/trunk/obs_sequence/obs_common_subset.f90	2014-05-22 16:17:32 UTC (rev 7002)
@@ -4,44 +4,47 @@
 !
 ! $Id$
 
-program obs_common_subset
 
-! This program takes as input one or more sets of N obs_seq filenames. 
-! N is usually 2, but can be any number of files to compare together.
-! It opens the obs_seq files N at a time and does an obs-by-obs comparison.  
-! It is intended to pull out a common subset of obs which were successfully
-! assimilated in all experiments so the subsequent diagnostics can give
-! an apples-to-apples comparison of the relative error or fit to obs
-! between different experiments.
-!
-! This tool does not search in the input obs_seq files to try to match obs
-! in different orders or to skip over obs present in only one input file,
-! so all experiments should be done with identical input obs_seq files.
-! The obs to be assimilated or evaluated can be selected by namelist
-! at run time if they are different, and different experiment settings
-! can be selected by other namelist values.
-!
-! If the next input obs don't match exactly (identical types, times, locations,
-! and DART QC) they are discarded.  If they match, and if the DART QC indicates 
-! they have been successfully assimilated in all experiments, then each of the 
-! N input obs are copied through to the N output obs_seq files.  At the end of 
-! running this tool it creates N output obs_seq files which contain a consistent 
-! set of assimilated obs across the set.
+!>  This program takes as input one or more sets of N obs_seq filenames. 
+!>  N is usually 2, but can be any number of files to compare together.
+!>  It opens the obs_seq files N at a time and does an obs-by-obs comparison.  
+!>  It is intended to pull out a common subset of obs which were successfully
+!>  assimilated in all experiments so the subsequent diagnostics can give
+!>  an apples-to-apples comparison of the relative error or fit to obs
+!>  between different experiments.
+!> 
+!>  This tool does not search in the input obs_seq files to try to match obs
+!>  in different orders or to skip over obs present in only one input file,
+!>  so all experiments should be done with identical input obs_seq files.
+!>  The obs to be assimilated or evaluated can be selected by namelist
+!>  at run time if they are different, and different experiment settings
+!>  can be selected by other namelist values.
+!> 
+!>  If the next input obs don't match exactly (identical types, times, locations,
+!>  and DART QC) they are discarded.  If they match, and if the DART QC indicates 
+!>  they have been successfully assimilated in all experiments, then each of the 
+!>  N input obs are copied through to the N output obs_seq files.  At the end of 
+!>  running this tool it creates sets of N output obs_seq files which contain a 
+!>  consistent set of assimilated obs across the set.
+!> 
+!>  The outputs are suitable for input to the obs_diag or obs_seq_to_netcdf tools
+!>  and there are matlab scripts for comparing the results of N experiments.
+!>  Differences in the output diagnostics will now be because of the experiment 
+!>  differences, not differences in the number of obs assimilated.
+!> 
+!>  Running this program creates sets of N new output files corresponding to each
+!>  of N input files.  The output names are the input filenames with a suffix appended.  
+!>  It can take a list of obs_seq files, either given directly in the namelist or
+!>  a list of N filenames where each file contains one input filename per line.  
+!>  If the lists contain more than 1 filename, this program will loop over each
+!>  set of filenames in the lists until the end of the file.If more than N files
+!> 
+!>  There is an option to allow observations which have been either successfully 
+!>  assimilated or evaluated to match each other.  The default requires them to
+!>  have been treated identically in all experiments.
 
-! The outputs are suitable for input to the obs_diag or obs_seq_to_netcdf tools
-! and there are matlab scripts for comparing the results of N experiments.
-! Differences in the output diagnostics will now be because of the experiment 
-! differences, not differences in the number of obs assimilated.
+program obs_common_subset
 
-! Running this program creates sets of N new output files corresponding to each
-! of N input files.  The output names are the input filenames with a suffix appended.  
-! It can take a list of obs_seq files, either given directly in the namelist or
-! from a separate ascii file with one input filename per line.  If more than N files
-! are specified it will process them N at a time and loop until the end of the list.
-! The filenames should be listed so all files from experiment 1 are first, then all
-! files from experiment 2, up to N experiments.
-
-
 use        types_mod, only : r8, metadatalength
 use    utilities_mod, only : register_module, initialize_utilities,            &
                              find_namelist_in_file, check_namelist_read,       &
@@ -73,30 +76,31 @@
 character(len=32 ), parameter :: revision = "$Revision$"
 character(len=128), parameter :: revdate  = "$Date$"
 
-! max number of files to compare against each other.  this program will loop
-! over sets of files for as long as there are input files in the list.  if you
-! need to compare more than 50 files together, let us know what you are doing
-! because i'd be very interested to hear.  but you can make the maxseq larger
-! in that case.
+! max_num_input_files : maximum total number of input sequence files to be processed.
+integer, parameter   :: max_num_input_files = 5000
 
-integer, parameter      :: maxseq = 50   ! max number of files compared at once
+! max number of files to compare against each other.  if you need to compare 
+! more than 50 files together, let us know what you are doing because i'd be 
+! very interested to hear.  but you can make the maxcomp larger in that case.
 
-type(obs_sequence_type) :: seq_in(maxseq)
-type(obs_sequence_type) :: seq_out(maxseq)
-type(obs_type)          :: obs_in(maxseq), next_obs_in(maxseq)
-type(obs_type)          :: obs_out(maxseq), prev_obs_out(maxseq)
-logical                 :: is_this_last(maxseq)
+integer, parameter      :: maxcomp = 50  ! max number of files compared at once
+
+type(obs_sequence_type) :: seq_in(maxcomp)
+type(obs_sequence_type) :: seq_out(maxcomp)
+type(obs_type)          :: obs_in(maxcomp), next_obs_in(maxcomp)
+type(obs_type)          :: obs_out(maxcomp), prev_obs_out(maxcomp)
+logical                 :: is_this_last(maxcomp)
 logical                 :: wanted
-integer                 :: size_seq_in(maxseq), num_copies_in(maxseq), num_qc_in(maxseq)
-integer                 :: size_seq_out, num_inserted, iunit, io, i, j, k
-integer                 :: max_num_obs(maxseq), file_id, atonce, nsets, offset
+integer                 :: size_seq_in(maxcomp), num_copies_in(maxcomp), num_qc_in(maxcomp)
+integer                 :: size_seq_out, num_inserted, iunit, io, i, j, k, nextfile
+integer                 :: max_num_obs(maxcomp), file_id, atonce, nsets
 integer                 :: num_rejected_badqc, num_rejected_diffqc, num_rejected_other
 integer                 :: num_mismatch_loc, num_mismatch_time, num_mismatch_type
 character(len = 129)    :: read_format
 logical                 :: pre_I_format, cal
-character(len = 256)    :: msgstring, msgstring1, msgstring2, msgstring3
-character(len = 256)    :: filename_in(maxseq)
-character(len = 300)    :: filename_out(maxseq)     ! filename_in + . + suffix
+character(len = 512)    :: msgstring, msgstring1, msgstring2, msgstring3
+character(len = 256)    :: filename_in(maxcomp)
+character(len = 384)    :: filename_out(maxcomp)     ! filename_in + . + suffix
 
 character(len = metadatalength), parameter :: dart_qc_meta_data = 'DART quality control'
 character(len = metadatalength) :: meta_data
@@ -104,29 +108,22 @@
 integer                 :: qc_index
 integer                 :: num_input_sets = 0
 
+character(len = 256) :: temp_filelist(max_num_input_files) = ''
+
 !----------------------------------------------------------------
 ! Namelist input with default values
 
-! max_num_input_files : maximum number of input sequence files to be processed
-! if filenames are specified in the namelist.  if filenames are specified in a
-! separate ascii input file, the list length can be infinite.  for names in the
-! namelist, the len must be > num_to_compare_at_once * number of sets of files 
-! to do.  e.g. if you are comparing 3 files at a time, and want to compare a 
-! series of 100 sets of 3 files, that would be 300 filenames.  make these
-! numbers bigger if these values are too small, and recompile.
-
-integer, parameter   :: max_num_input_files = 5000
-
 character(len = 256) :: filename_seq(max_num_input_files) = ''
-character(len = 256) :: filename_seq_list  = ''
+character(len = 256) :: filename_seq_list(maxcomp)  = ''
 character(len = 32)  :: filename_out_suffix = '.common'
 
 integer :: num_to_compare_at_once = 2
 logical :: print_only = .false.
 
 ! not expected to be changed often, but its here if needed.
-integer :: print_every = 1000
+integer :: print_every = 10000
 integer :: dart_qc_threshold = 3     ! ok if DART QC <= this
+logical :: eval_and_assim_can_match = .false. 
 
 character(len=32) :: calendar = 'Gregorian'
 
@@ -134,7 +131,8 @@
          num_to_compare_at_once,          &
          filename_seq, filename_seq_list, &
          filename_out_suffix, print_only, &
-         print_every, dart_qc_threshold, calendar
+         print_every, dart_qc_threshold,  &
+         calendar, eval_and_assim_can_match
 
 !----------------------------------------------------------------
 ! Start of the program:
@@ -161,19 +159,19 @@
 ! as in the other tools, it is an error to specify both an explicit
 ! list and the name of file for input.
 
-write(msgstring, '(A,I3,A)') "Comparing ", num_to_compare_at_once, " obs_seq files at a time"
+write(msgstring, '(A,I3,A)') "Asking to compare ", num_to_compare_at_once, " obs_seq files at a time"
 call error_handler(E_MSG, "obs_common_subset", msgstring, source,revision,revdate)
 
-if ((num_to_compare_at_once < 2) .or. (num_to_compare_at_once > maxseq)) then
-   write(msgstring, *) "num_to_compare_at_once must be >= 2 and <= ", maxseq
+if ((num_to_compare_at_once < 2) .or. (num_to_compare_at_once > maxcomp)) then
+   write(msgstring, *) "num_to_compare_at_once must be >= 2 and <= ", maxcomp
    call error_handler(E_ERR, "obs_common_subset", msgstring, source,revision,revdate)
 endif
 
 ! if there is only a single file from each experiment to compare, the order
-! is obvious.  if each experiment has a list of input obs_seq.final files from
-! a multi-step assimilation, list all the files from experiment 1, then all
-! files from experiment 2, etc.
-call handle_filenames(num_to_compare_at_once, filename_seq, filename_seq_list, &
+! doesn't matter.  if each experiment has a list of input obs_seq.final files from
+! a multi-step assimilation, list all the files from experiment 1 in list-file 1,
+! all files from experiment 2 in list-file 2, etc.
+call handle_filenames(num_to_compare_at_once, maxcomp, filename_seq, filename_seq_list, &
                       num_input_sets)
 
 if (num_input_sets > 1) then
@@ -202,42 +200,33 @@
 ! if they aren't present.
 
 ! count of input sets was set in the handle_filenames() routine above.
+nextfile = 1
 NUMSETS: do j = 1, nsets
 
    ! if num_to_compare_at_once = 3 and num_input_sets = 100, then this
    ! assumes the input file list is organized in this order:
-   !       a1,b1,c1,  a2,b2,c2, ... a100,b100,c100
+   !       a1,b1,c1, a2,b2,c2, ..., a100,b100,c100
    ! where a, b, c are the 3 separate experiments to compare, 
    ! and 1, 2, ... 100 are the obs_seq.final output files from 100 steps 
-   ! of a multi-step assimilation.
+   ! of a multi-step assimilation.  handle_filenames() arranges this
+   ! order if list files were used.
 
+   ! fill in the names of the current set of obs_seq files
    do i = 1, atonce
-  
-      offset = (i-1)*num_input_sets + j
-
-      ! fill in the names of the current set of obs_seq files
-      filename_in(i) = filename_seq(offset)
-
+      filename_in(i) = filename_seq(nextfile)
+      nextfile = nextfile + 1
    enddo
 
-   ! now filename_in(:) has all the names for this set
+   ! now filename_in(:) has all the names for this set.  read the headers in.
    do i = 1, atonce
   
-      if  ((len(filename_in(i)) == 0) .or. (filename_in(i) == "")) then
-         write(msgstring, *) 'sequence name ', i, ' in set ', j, ' is empty or null'
-         call error_handler(E_ERR,'obs_common_subset', msgstring, &
-            source,revision,revdate)  ! shouldn't happen because already tested for this
-      endif
-
-      ! read in the next set of files.
-
       call read_obs_seq_header(filename_in(i), num_copies_in(i), num_qc_in(i), &
          size_seq_in(i), max_num_obs(i), file_id, read_format, pre_I_format, &
          close_the_file = .true.)
 
       if (size_seq_in(i) == 0) then
          write(msgstring1, *) 'This tool cannot process empty obs_seq files'
-         write(msgstring,*) 'Obs in input sequence file ', trim(filename_in(i))
+         write(msgstring,*) 'No obs found in input sequence file ', trim(filename_in(i))
          call error_handler(E_ERR,'obs_common_subset',msgstring, &
             source,revision,revdate, text2=msgstring1)
       endif
@@ -479,20 +468,22 @@
 
 !---------------------------------------------------------------------
 
-subroutine handle_filenames(setsize, filename_seq, filename_seq_list, &
+subroutine handle_filenames(setsize, maxsets, filename_seq, filename_seq_list, &
                             num_input_sets)
 
 ! sort out the input lists, set the length as a return in num_input_sets,
 ! fill in filename_seq if list was given, and make sure what's specified is consistent.
 
 integer,            intent(in)    :: setsize
+integer,            intent(in)    :: maxsets
 character(len=*),   intent(inout) :: filename_seq(:)
-character(len=*),   intent(in)    :: filename_seq_list
+character(len=*),   intent(in)    :: filename_seq_list(:)
 integer,            intent(out)   :: num_input_sets
 
-integer :: indx, num_input_files
+integer :: indx, num_input_files, i, ifiles, fcount, this_set_filecount
+integer :: files_per_set, nlists, iset, src, dst
 logical :: from_file
-character(len=32) :: source
+character(len=32) :: nname
 
 ! if the user specifies neither filename_seq nor filename_seq_list, error
 ! if the user specifies both, error.
@@ -501,7 +492,7 @@
 ! maxfiles and read it into the explicit list and continue.
 ! set num_input_sets to the count of sets in the list
 
-if (filename_seq(1) == '' .and. filename_seq_list == '') then
+if (filename_seq(1) == '' .and. filename_seq_list(1) == '') then
    msgstring2='One of filename_seq or filename_seq_list must be specified in namelist'
    call error_handler(E_ERR,'handle_filenames',            &
                       'no filenames specified as input',  &
@@ -509,64 +500,143 @@
 endif
 
 ! make sure the namelist specifies one or the other but not both
-if (filename_seq(1) /= '' .and. filename_seq_list /= '') then
+if (filename_seq(1) /= '' .and. filename_seq_list(1) /= '') then
    call error_handler(E_ERR,'handle_filenames', &
        'cannot specify both filename_seq and filename_seq_list in namelist', &
        source,revision,revdate)
 endif
 
-! if they have specified a file which contains a list, read it into
-! the filename_seq array and set the count.
-if (filename_seq_list /= '') then
-   source = 'filename_seq_list'
-   from_file = .true.
-else
-   source = 'filename_seq'
+! count of all filenames, either explicitly in the namelist
+! or specified by a list of files which contain lists of names.
+num_input_files = -1
+nlists = -1
+
+if (filename_seq(1) /= '') then
+
+   ! if the files are given in the namelist, just need to make sure
+   ! there are at least N given (where N == num_to_compare_at_once).
+
+   nname = 'filename_seq'
    from_file = .false.
-endif
 
+   FILELOOP: do indx = 1, max_num_input_files
+      ! an empty name ends the list
+      if (filename_seq(indx) == '') then
+         if (indx < num_to_compare_at_once) then
+            write(msgstring, '(A,I5,A)') trim(nname)//' must contain at least ', &
+                num_to_compare_at_once, ' obs_seq filenames'
+            call error_handler(E_ERR,'handle_filenames', msgstring, &
+                source,revision,revdate)
+         endif
+         num_input_files = indx - 1 
+         exit FILELOOP
+      endif
+   enddo FILELOOP
 
-! debug?
-if (.false.) write(*,*)'filename_seq(1) ',trim(filename_seq(1)),' ', trim(source), ' ',from_file
+   ! make sure the list is an even multiple of setsize
+   if (modulo(num_input_files, setsize) /= 0) then
+      write(msgstring, *) 'number of input files must be an even multiple of ', setsize
+      write(msgstring1, *) 'found ', num_input_files, ' input files'
+      call error_handler(E_ERR,'handle_filenames', msgstring, source,revision,revdate, &
+                         text2=msgstring1)
+   endif
 
-! count of filenames in list
-num_input_files = 0
+   num_input_sets = num_input_files / setsize
+else
 
-! the point of this loop is to count up how many sets of input seq files we have,
-! and to fill in the filename_seq(:) array if a list file was given.
-FILELOOP: do indx = 1, max_num_input_files
-   if (from_file) &
-      filename_seq(indx) = get_next_filename(filename_seq_list, indx)
+   ! if they have specified a file which contains a list, read it into
+   ! a temp array, set the counts, and do some initial error checks.
 
-   ! an empty name ends the list
-   if (filename_seq(indx) == '') then
-      if (indx == 1) then
-         call error_handler(E_ERR,'handle_filenames', &
-             trim(source)//' contains no input obs_seq filenames', &
-             source,revision,revdate)
-      endif
+   nname = 'filename_seq_list'
+   from_file = .true.
 
-      exit FILELOOP
+   ! this is the total running count of files across all lists
+   fcount = 1   
 
-   endif
+   EXPLOOP: do iset = 1, maxsets
+      
+      ! an empty name ends the list of lists
+      if (filename_seq_list(iset) == '') then
+         num_input_files = fcount - 1 
+         nlists = iset - 1
+         if (nlists /= setsize) then
+            write(msgstring1, '(A)') 'number of input file lists in "'//trim(nname)// &
+                                     '" must equal "num_to_compare_at_once"'
+            write(msgstring2, '(2(A,I5))') trim(nname)//' has ', nlists, &
+                  ' items while num_to_compare_at_once is ', setsize
+         
+            call error_handler(E_ERR,'handle_filenames', msgstring1, &
+                               source,revision,revdate, text2=msgstring2)
+         endif
+         exit EXPLOOP
+      endif
+   
+      LISTLOOP: do ifiles=1, max_num_input_files
+         if (fcount >= max_num_input_files) then
+            write(msgstring, *)  'cannot specify more than ',max_num_input_files,' total files'
+            write(msgstring1, *) 'for more, change "max_num_input_files" in the source and recompile'
+            call error_handler(E_ERR,'handle_filenames', msgstring, source,revision,revdate, &
+                               text2=msgstring1)
+         endif
 
-   num_input_files = indx
+         temp_filelist(fcount) = get_next_filename(filename_seq_list(iset), ifiles)
+     
+         ! an empty name ends the list
+         if (temp_filelist(fcount) == '') then
+            if (ifiles == 1) then
+               call error_handler(E_ERR,'handle_filenames', &
+                   trim(filename_seq_list(iset))//' contains no input obs_seq filenames', &
+                   source,revision,revdate)
+            endif
+            this_set_filecount = ifiles-1
+            exit LISTLOOP
+         endif
+         fcount = fcount + 1
+      enddo LISTLOOP
 
-enddo FILELOOP
+      ! make sure each file in the list contains the same number
+      ! of filenames to be compared.  set the target the first time
+      ! through and then compare subsequent counts to be sure they match.
+      if (iset == 1) then
+         files_per_set = this_set_filecount
+      else
+         if (files_per_set /= this_set_filecount) then
+            write(msgstring1, '(A)') trim(filename_seq_list(iset))//' does not contain ' // &
+               'the same number of obs_seq filenames as previous lists'
+            write(msgstring2, '(3(A,I5),A)') 'list 1 contains ', files_per_set, ' filenames while list ', iset, &
+               ' contains ', this_set_filecount, ' filenames'
+            call error_handler(E_ERR,'handle_filenames', msgstring1, &
+               source,revision,revdate, text2=msgstring2)
+         endif
+      endif
+   
+   enddo EXPLOOP
+   
+   num_input_sets = files_per_set
+endif
 
-if (num_input_files == max_num_input_files) then
-   write(msgstring, *) 'cannot specify more than ',max_num_input_files,' files'
-   call error_handler(E_ERR,'handle_filenames', msgstring, source,revision,revdate)
+if (num_input_files >= max_num_input_files) then
+   write(msgstring, *)  'cannot specify more than ',max_num_input_files,' files'
+   write(msgstring1, *) 'for more, change "max_num_input_files" in the source and recompile'
+   call error_handler(E_ERR,'handle_filenames', msgstring, source,revision,revdate, &
+                      text2=msgstring1)
 endif
 
-! make sure the list is an even multiple of setsize
-if (modulo(num_input_files, setsize) /= 0) then
-   write(msgstring, *) 'number of input files must be an even multiple of ', setsize
-   call error_handler(E_ERR,'handle_filenames', msgstring, source,revision,revdate)
+if (from_file) then
+   ! now that we know we have the right number of filenames, copy them
+   ! from the temp list into filename_seq in the right order for processing:  
+   ! first all file 1s from each experiment, then file 2s, etc.
+   ! the temp list is all files from exp1, all files from exp2, etc
+   dst = 0
+   do i=1, num_input_sets
+      do j=1, setsize
+         src = (j-1)*num_input_sets + i
+         dst = dst + 1
+         filename_seq(dst) = temp_filelist(src)
+      enddo
+   enddo
 endif
 
-num_input_sets = num_input_files / setsize
-
 end subroutine handle_filenames
 
 !---------------------------------------------------------------------
@@ -732,8 +802,6 @@
    else
       type_count(this_obs_kind) = type_count(this_obs_kind) + 1
    endif
-!   print *, 'obs kind index = ', this_obs_kind
-!   if(this_obs_kind > 0)print *, 'obs name = ', get_obs_kind_name(this_obs_kind)
 
    call get_next_obs(seq_in, obs, next_obs, is_this_last)
    if (.not. is_this_last) then
@@ -988,17 +1056,19 @@
       return
    endif
    
-   ! here is the first test we expect to be the reason most obs fail:
-   ! if they've been assimilated in one experiment but not another.
-   if (test1_qc /= testN_qc) then
-      num_rejected_diffqc = num_rejected_diffqc + 1
+   ! check first to see if either of the DART QCs indicate
+   ! that this obs was rejected for some reason.  this code
+   ! is testing the first QC multiple times if N > 2 but
+   ! it makes the logic simpler so just repeat the test.
+   if (test1_qc > threshold .or. testN_qc > threshold) then
+      num_rejected_badqc = num_rejected_badqc + 1
       return
    endif
 
-   ! and here is the second:
-   ! if they were not assimilated in any experiment.
-   if (test1_qc > threshold) then
-      num_rejected_badqc = num_rejected_badqc + 1
+   ! this is the test we expect to be the reason most obs fail:
+   ! if they've been assimilated in one experiment but not another.
+   if (.not. match_qc(test1_qc, testN_qc)) then
+      num_rejected_diffqc = num_rejected_diffqc + 1
       return
    endif
 
@@ -1011,6 +1081,39 @@
 
 !---------------------------------------------------------------------
 
+function match_qc(qc1, qc2)
+
+! normal behavior is to return true if the values are equal,
+! and false if not.  but if the "it's ok for evaluated and
+! assimilated obs to match", then allow DART QC value 0 to 
+! match 1, and 2 to match 3, but no other combinations are valid.
+
+integer, intent(in) :: qc1
+integer, intent(in) :: qc2
+logical             :: match_qc
+
+! if the combination of values is ok, return early
+match_qc = .true.
+
+if (qc1 == qc2) return
+
+if (eval_and_assim_can_match) then
+   if (qc1 == 0 .and. qc2 == 1) return
+   if (qc1 == 1 .and. qc2 == 0) return
+
+   if (qc1 == 2 .and. qc2 == 3) return
+   if (qc1 == 3 .and. qc2 == 2) return
+endif
+
+! not one of the allowed matches.  these qcs really
+! are not compatible and the match failed.
+
+match_qc = .false.
+
+end function match_qc
+
+
+!---------------------------------------------------------------------
 end program obs_common_subset
 
 ! <next few lines under version control, do not edit>

Modified: DART/trunk/obs_sequence/obs_common_subset.html
===================================================================
--- DART/trunk/obs_sequence/obs_common_subset.html	2014-05-21 23:15:14 UTC (rev 7001)
+++ DART/trunk/obs_sequence/obs_common_subset.html	2014-05-22 16:17:32 UTC (rev 7002)
@@ -36,40 +36,95 @@
 <H2>Overview</H2>
 
 <P>
-This specialized tool allow you to select sets of observations
-from two or more input observation sequence files
-where the obs were successfully assimilated in all files.
-It creates a new set of observation sequence files containing only that
-subset of the original observations.
+This specialized tool allows you to select subsets of observations
+from two or more observation sequence files output from <em class=file>filter</em>.
+It creates a new set of output observation sequence files containing only
+the observations which were successfully assimilated in all experiments.
 </P>
 <P>
-The intended use of this tool is as part of the process of comparing the results
-from a group of related experiments in which the same observation sequence file
-is used as input for all runs.  The tool cannot process observation sequence
+Experiments using the same input observation sequence file but
+with different configurations (e.g. different inflation values,
+different localization radii, etc) can assimilate different
+numbers of the available observations.  
+In that case there will be differences in the diagnostic plots
+which are not directly relatable to the differences in the quality
+of the assimilation.  If this tool is run on the <em class=file>obs_seq.final</em> 
+files from all the experiments and then the diagnostics are generated,
+only the observations which were assimilated in all experiments will
+contribute to the summary statistics.  A more direct comparison can be
+made and improvements can be correctly attributed to the differences 
+in the experimental parameters.
+</P>
+<P>
+This tool is intended to be used when comparing the results
+from a group of related experiments in 
+which <strong>the exact same input observation sequence file</strong>
+is used for all runs.  The tool cannot process observation sequence
 files which differ in anything other than whether an observation was successfully
-assimilated/evaluated or not.
+assimilated/evaluated or not.  Note that it is fine to add or remove observation
+types from the <em class=code>assimilate_these_obs_types</em> or 
+<em class=code>evaluate_these_obs_types</em> namelist items for different
+experiments.  The output observation
+sequence files will still contain an identical list of observations, with some
+marked with a DART QC indicating 'not assimilated because of namelist control'.
 </P>
 <P>
-Experiments with different configurations can assimilate different
-numbers of the available observations.  In that case there will be differences
-in the diagnostic plots
-which are not directly relatable to the differences in the quality
-of the assimilation.  If this tool is run on the obs_seq.final files from 
-all the experiments and then the diagnostics are generated,
-only the observations which were assimilated in all experiments will
-contribute to the summary statistics.  A more direct comparison can be
-made and improvements attributed to the differences in the experimental 
-parameters.
-</P><P>
 See the <a href="http://www.image.ucar.edu/DAReS/DART/DART_Documentation.php#obs_diagnostics">
 two experiment diagnostic plot</a> documentation for Matlab scripts
 supplied with DART to directly compare the observation diagnostic output
-from multiple experiments (more than two, the script has a poor name).
+from multiple experiments (it does more than two, the script has a poor name).
 </P><P>
 This is one of a set of tools which operate on observation sequence files.
-For a more general purpose tool, see the 
-<a href="obs_sequence_tool.hmtl">obs_sequence_tool</a>.
+For a more general purpose tool see the 
+<a href="obs_sequence_tool.html">obs_sequence_tool</a>, 
+and for a more flexible selection tool see the
+<a href="obs_selection_tool.html">obs_selection_tool</a>.
 </P>
+<H4>Creating an Input Filelist</H4>
+<P>
+One of the inputs to this tool is a list of filenames to compare.
+The filenames can be directly in the namelist file, or they can
+be in a set of separate text files.  The latter may be easier when there
+are more than just a few files to compare.
+</P>
+<P>
+For experiments where there are multiple job steps, and so multiple
+output observation sequence files per experiment, the input to this
+tool would then be a list of lists of filenames.
+Each set of names must be put into a text file with
+each filename on a separate line.
+</P><P>
+If each experiment was run in a different set of directories,
+and if a list of observation sequence filenames was made 
+with the <em class=file>ls</em> command:
+</P>
+<pre>
+> ls exp1/*/obs_seq.final > exp1list
+> cat exp1list
+exp1/step1/obs_seq.final
+exp1/step2/obs_seq.final
+exp1/step3/obs_seq.final
+exp1/step4/obs_seq.final
+> ls exp2/*/obs_seq.final > exp2list
+> cat exp2list
+exp2/step1/obs_seq.final
+exp2/step2/obs_seq.final
+exp2/step3/obs_seq.final
+exp2/step4/obs_seq.final
+> ls exp3/*/obs_seq.final > exp3list
+> cat exp2list
+exp3/step1/obs_seq.final
+exp3/step2/obs_seq.final
+exp3/step3/obs_seq.final
+exp3/step4/obs_seq.final
+</pre>
+<P>Then the namelist entries would be:
+<pre>
+ filename_seq = ''
+ filename_seq_list = 'exp1list', 'exp2list', exp3list'
+ num_to_compare_at_once = 3
+</pre>
+</P>
 
 <!--==================================================================-->
 <!--=================== DESCRIPTION OF A NAMELIST ====================-->
@@ -94,10 +149,11 @@
  filename_seq           = '',
  filename_seq_list      = '',
  filename_out_suffix    = '.common' ,
- print_every            = 1000,
+ print_every            = 10000,
  dart_qc_threshold      = 3,
  calendar               = 'Gregorian',
  print_only             = .false.,
+ eval_and_assim_can_match = .false.,
 /
 </pre>
 </div>
@@ -115,7 +171,7 @@
 
 <TBODY valign=top>
 
-<TR><TD>num_to_process_at_once</TD>
+<TR><TD>num_to_compare_at_once</TD>
     <TD>integer</TD>
     <TD>Number of observation sequence files to compare together at a 
 time.  Most commonly the value is 2, but can be any number.
@@ -127,20 +183,20 @@
     <TD>character(len=256), dimension(5000)</TD>
     <TD>The array of names of the observation sequence files to process.  
 If more than N files (where N is num_to_compare_at_once) are listed,
-they should be ordered so all files from the first experiment are together,
-followed by all files from the second experiment, up to N experiments.
+they should be ordered so the first N files are compared together,
+followed by the next set of N files, etc.
+You can only specify one of filename_seq OR filename_seq_list, not both.  
 </TD></TR>
 
 <TR><TD>filename_seq_list</TD>
-    <TD>character(len=256)</TD>
+    <TD>character(len=256), dimension(100)</TD>
     <TD>An alternative way to specify the list of input
-observation sequence files.  Give the name of a text file which contains, one per
-line, the names of the observation sequence files to process.  You can only
-specify one of filename_seq OR filename_seq_list, not both.  As with the
-<em class=code>filename_seq</em> item,
-if more than N files (where N is num_to_compare_at_once) are listed,
-they should be ordered so all files from the first experiment are together,
-followed by all files from the second experiment, up to N experiments.
+observation sequence files.  Give a list of N filenames which contain, 
+one per line, the names of the observation sequence files to process.
+There should be N files specified (where N is num_to_compare_at_once), and
+the first observation sequence filename listed in each file will be compared
+together, then the second, until the lists are exhausted.
+You can only specify one of filename_seq OR filename_seq_list, not both.  
 </TD></TR>
 
 <TR><TD>filename_out_suffix</TD>
@@ -158,15 +214,17 @@
 
 <TR><TD>dart_qc_threshold</TD>
     <TD>integer</TD>
-    <TD>The limit for the DART QC value.  For an observation which was
+    <TD>Observations with a DART QC value larger than this threshold
+will be discarded.  
+Note that this is the QC value set by <em class=file>filter</em>
+to indicate the outcome of trying to assimilate an observation.  
+This is not related to the incoming data QC.
+For an observation which was
 successfully assimilated or evaluated in both the Prior and Posterior 
 this should be set to 1.  To also include observations which were successfully
-processed in the Prior but not the Posterior, set to 3.  It is not 
-recommended to set it to a higher value than 3.
-Note that this is the QC value set by <em class=file>filter</em>
-to indicate the outcome of
-trying to assimilate an observation.  
-This is not related to the incoming data QC.
+processed in the Prior but not the Posterior, set to 3.  To ignore
+the magnitude of the DART QC values and keep observations only if
+the DART QCs match, set this to any value higher than 7.
 </TD></TR>
 
 <TR><TD>calendar</TD>
@@ -184,6 +242,12 @@
 the input and output files.
 </TD></TR>
 
+<TR><TD>eval_and_assim_can_match</TD>
+    <TD>logical</TD>
+    <TD>Normally .FALSE. . If .TRUE. then observations which were
+either successfully evaluated OR assimilated will match and are kept.
+</TD></TR>
+
 </TBODY> 
 </TABLE>
 </div>

Modified: DART/trunk/obs_sequence/obs_common_subset.nml
===================================================================
--- DART/trunk/obs_sequence/obs_common_subset.nml	2014-05-21 23:15:14 UTC (rev 7001)
+++ DART/trunk/obs_sequence/obs_common_subset.nml	2014-05-22 16:17:32 UTC (rev 7002)
@@ -4,8 +4,9 @@
    filename_seq_list      = '',
    filename_out_suffix    = '.common' ,
    calendar               = 'Gregorian',
-   print_every            = 1000,
+   print_every            = 10000,
    dart_qc_threshold      = 3,
    print_only             = .false.,
+   eval_and_assim_can_match = .false.
 /