[Dart-dev] DART/branches Revision: 11952

dart at ucar.edu dart at ucar.edu
Thu Sep 21 16:59:35 MDT 2017


nancy at ucar.edu
2017-09-21 16:59:34 -0600 (Thu, 21 Sep 2017)
918
add a namelist option to make a copy of the buffer before
a broadcast (was already an option to do this for send/recv).
this is expected to be needed only for mpi implementations
which can't deal with array sections properly.

rearrange the code in send, recv, and broadcast to combine
sections in a more sane way depending on whether you are
making a copy or not.  support the copy option even if
you have to break large transfers up into multiple operations.

you should *not* need to make these copies; they cost time
and space.  however when trying to debug strange mpi problems
trying them out is a good initial test.

there is still an issue on what the max buffer limit should
be for a single operation; we currently use 2,000,000,000
(not 2 * 1024^3) for both send, recv, and broadcast.  
johnny says it might be necessary to drop this slighly for 
broadcast with some of the mpi implementations on cheyenne.




Modified: DART/branches/rma_trunk/assimilation_code/modules/utilities/mpi_utilities_mod.f90
===================================================================
--- DART/branches/rma_trunk/assimilation_code/modules/utilities/mpi_utilities_mod.f90	2017-09-21 22:41:58 UTC (rev 11951)
+++ DART/branches/rma_trunk/assimilation_code/modules/utilities/mpi_utilities_mod.f90	2017-09-21 22:59:34 UTC (rev 11952)
@@ -72,16 +72,16 @@
 ! this directory.  It is a sed script that comments in and out the interface
 ! block below.  Please leave the BLOCK comment lines unchanged.
 
- !!SYSTEM_BLOCK_EDIT START COMMENTED_IN
- ! interface block for getting return code back from system() routine
- interface
-  function system(string)
-   character(len=*) :: string
-   integer :: system
-  end function system
- end interface
- ! end block
- !!SYSTEM_BLOCK_EDIT END COMMENTED_IN
+! !!SYSTEM_BLOCK_EDIT START COMMENTED_OUT
+! ! interface block for getting return code back from system() routine
+! interface
+!  function system(string)
+!   character(len=*) :: string
+!   integer :: system
+!  end function system
+! end interface
+! ! end block
+! !!SYSTEM_BLOCK_EDIT END COMMENTED_OUT
 
 
 ! allow global sum to be computed for integers, r4, and r8s
@@ -159,9 +159,10 @@
 ! from any task will print regardless of this setting.
 logical :: all_tasks_print      = .false.   ! by default only msgs from 0 print
 
-! make local copy for send/recv.  was needed on an old, buggy version
+! make local array copy for send/recv/bcast.  was needed on an old, buggy version
 ! of the mpi libs but seems unneeded now. 
-logical :: make_copy_before_sendrecv = .false.   ! should not be needed; .true. is very slow
+logical :: make_copy_before_sendrecv  = .false.   ! should not be needed; .true. is very slow
+logical :: make_copy_before_broadcast = .false.   ! should not be needed; .true. is very slow
 
 ! NAMELIST: change the following from .false. to .true. to enable
 ! the reading of this namelist.  This is the only place you need
@@ -171,7 +172,7 @@
 namelist /mpi_utilities_nml/ reverse_task_layout, all_tasks_print, &
                              verbose, async2_verbose, async4_verbose, &
                              shell_name, separate_node_sync, create_local_comm, &
-                             make_copy_before_sendrecv
+                             make_copy_before_sendrecv, make_copy_before_broadcast 
 
 contains
 
@@ -545,57 +546,57 @@
 ! use my task id as the tag; unused at this point.
 tag = myrank
 
-if (make_copy_before_sendrecv) then
-   if (itemcount > SNDRCV_MAXSIZE) then
-      write(errstring, '(a,i12,a,i12)') "MPI msg contains ", itemcount, &
-            " items; above single msg limit of ", SNDRCV_MAXSIZE
-      write(errstring1, '(a)') "cannot make local copy; change 'make_copy_before_sendrecv' to .false. so msg can be sent in multiple chunks"
-      call error_handler(E_ERR,'send_to', errstring, source, revision, revdate, &
-                         text2=errstring1)
-   endif
+if (make_copy_before_sendrecv) allocate(tmpdata(min(itemcount, SNDRCV_MAXSIZE)))
 
-   allocate(tmpdata(itemcount))
 
-   ! this copy should be unneeded, but on the intel fortran 9.0 compiler and mpich
-   ! on one platform, calling this routine with an array section resulted in some
-   ! apparently random memory corruption.  making a copy of the data into a local,
-   ! contiguous buffer before send and receive fixed the corruption.  this should
-   ! be examined at some later time for any performance impact.
-   tmpdata = srcarray
+if (itemcount <= SNDRCV_MAXSIZE) then
 
-   if (verbose) write(*,*) "PE", myrank, ": send_to alloctmp ", itemcount, " dest ", dest_id
-   call MPI_Ssend(tmpdata, itemcount, datasize, dest_id, tag, &
-                 my_local_comm, errcode)
+   if (verbose) write(*,*) "PE", myrank, ": send_to ", itemcount, " dest ", dest_id
+
+   if (.not. make_copy_before_sendrecv) then
+      call MPI_Ssend(srcarray, itemcount, datasize, dest_id, tag, &
+                    my_local_comm, errcode)
+   else
+      ! this copy should be unneeded, but on the intel fortran 9.0 compiler and mpich
+      ! on one platform, calling this subroutine with an array section resulted in some
+      ! apparently random memory corruption.  making a copy of the data into a local,
+      ! contiguous buffer before send and receive fixed the corruption.  this shouldn't
+      ! have been needed, and is a performance/memory sink.
+
+      tmpdata = srcarray
+      call MPI_Ssend(tmpdata, itemcount, datasize, dest_id, tag, &
+                    my_local_comm, errcode)
+   endif
 else
    ! there are a few places in the code where we send/receive a full state vector.
    ! as these get really large, they may exceed the limits of the MPI library.
    ! break large messages up into smaller chunks if needed.  
-   if (itemcount > SNDRCV_MAXSIZE) then


More information about the Dart-dev mailing list