<p><b>dwj07@fsu.edu</b> 2012-06-22 10:02:33 -0600 (Fri, 22 Jun 2012)</p><p><br>

        -- BRANCH COMMIT --<br>

<br>

        Updating multiple blocks design doc.<br>

</p><hr noshade><pre><font color="gray">Modified: branches/omp_blocks/docs/mpas_multiple_blocks.pdf

===================================================================

(Binary files differ)

Modified: branches/omp_blocks/docs/mpas_multiple_blocks.tex

===================================================================

--- branches/omp_blocks/docs/mpas_multiple_blocks.tex        2012-06-21 20:01:09 UTC (rev 2000)

+++ branches/omp_blocks/docs/mpas_multiple_blocks.tex        2012-06-22 16:02:33 UTC (rev 2001)

@@ -170,271 +170,229 @@

 \begin{figure}[H!]

         \centering

-        \includegraphics[scale=0.35]{DesignLayout.eps}

+        \includegraphics[scale=0.4]{DesignLayout.eps}

         \caption{Layout of modules for input/output with multiple blocks}

         \label{fig:module_layout}

 \end{figure}

-To begin, a change to the exchange lists need to be made to support cleaner

-versions of local copies. The current structure of an exchange list can be seen

-below

-\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-type exchange_list

-  integer :: procID

-  integer :: blockID

-  integer :: nlist

-  integer, dimension(:), pointer :: list

-  type (exchange_list), pointer :: next

-  real (kind=RKIND), dimension(:), pointer :: rbuffer

-  integer, dimension(:), pointer           :: ibuffer

-  integer :: reqID

-end type exchange_list

-\end{lstlisting}

+The changes made to mpas framework can be seen in the following sections.

-The exchange lists requires less information now, and can be seen below:

+\section{Changes in mpas\_dmpar}

+This section covers the changes, and the new functionality of the changes that

+were made within the mpas\_dmpar.F file. The first major change, is that dmpar

+is renamed to comm meaning communications. 

+\subsection{Data types}

+Previously the derived data type mpas\_exchange\_list was used to create a linked list which represented an exchange list (send or receive). Exchange lists have been modified from their original state to have the following structure.

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

 type mpas_exchange_list

-  integer :: endPointID

-  integer :: nlist

-  integer, dimension(:), pointer :: srcList

-  integer, dimension(:), pointer :: destList

-  type (exchange_list), pointer :: next

+ integer :: endPointID

+ integer :: nlist

+ integer, dimension(:), pointer :: srcList

+ integer, dimension(:), pointer :: destList

+ type (mpas_exchange_list), pointer :: next

 end type mpas_exchange_list

 \end{lstlisting}

-where endPointID can represent a procID or a blockID, srcList represents where

-the data should come from, and destList represents where the data should go.

-These are different depending on the specific type of exchange list provided.

-In the case of a sendList, srcList represents the indices in the current field's

-array to grab data from while destList represents where in the buffer array

-this data should go. These are opposite for recvLists. While copyLists are

-built on the sending block side, and srcList represents the indices in the

-current field's array to get the data from while destList represnts the indices

-in the receiving block's array's field to put the data into.

+Within this structure endPointID can be either a blockID (for local copy) or a procID for mpi send/recv, nList represents the number of elements to be communication within this exchange list, srcList and destList represent the information for packing and unpacking data into buffers or for local copies from/to arrays.

+Seeing as the exchange lists no longer contain an array for the communication buffers a new data type was created which can be seen below.

+

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

 type mpas_communication_list

-  integer :: procID

-  integer :: nlist

-  type (exchange_list), pointer :: next

-  real (kind=RKIND), dimension(:), pointer :: rbuffer

-  integer, dimension(:), pointer           :: ibuffer

-  integer :: reqID

+ integer :: procID

+ integer :: nlist

+ real (kind=RKIND), dimension(:), pointer :: rbuffer

+ integer, dimension(:), pointer :: ibuffer

+ integer :: reqID

+ type (mpas_communication_list), pointer :: next

 end type mpas_communication_list

 \end{lstlisting}

-where the communication list is created with the understanding that it handles

-MPI send/recvs, procID is either the receiving processorID or the sending

-processorID from the other end of the communication, nlist represents the total

-number of elements in this communication list, rbuffer and ibuffer represent

-the data buffers for the communications, and reqID represents the mpi request id.

+An mpas\_communcation\_list is only intended to be used for mpi communications. These have to be created and destroyed each time a communcation is performed. Within the structure, procID represents the other endPoint's processor ID in the communication, nList is the number of elements to be communicated within the buffer, rbuffer and ibuffer are the pointers for the buffer of either integers or reals, reqID represents the mpi communcation id, and next creates a linked list.

-Communication lists are used to aggregate mpi communication

-buffers across multiple blocks.

+Two other data structures are created to allow multi halo exchange lists, and to allow pointers to be swapped easily using them.

-Changes within the framework of MPAS can be seen below. To begin, a name change

-to the dmpar module is proposed. The new name will be mpas\_comm.F rather than

-mpas\_dmpar.F.

+\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

+type mpas_exchange_list_pointer

+ type (mpas_exchange_list), pointer :: exchList

+end type mpas_exchange_list_pointer

-\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-mpas_dmpar_alltoall_field

-mpas_dmpar_exch_halo_field

-mpas_dmpar_get_owner_list

-mpas_input_state_for_domain

+type mpas_multihalo_exchange_list

+ type (mpas_exchange_list_pointer), dimension(:), pointer :: halos

+end type mpas_multihalo_exchange_list

 \end{lstlisting}

-Within the allToAll and exch\_halo routines, loops over multiple blocks need to

-be added where they are not currently in place. Also, shared memory copies

-within local blocks need to be added.

+Combining these four structures allows local and mpi communications to occur smoothly.

-The old allToAll interfaces will have to change in order to accommodate the new

-field data types. The old interface looks like

-\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-subroutine mpas_dmpar_alltoall_field1d_real(dminfo, arrayIn, 

-                          arrayOut, nOwnedList, nNeededList, 

-                          sendList, recvList)

-\end{lstlisting}

+\subsection{mpas\_comm\_get\_exch\_list}

+The old routine named mpas\_dmpar\_get\_owner\_list is now renamed to mpas\_comm\_get\_exch\_list. This routine is meant to create send, receive, and copy lists to send data from one processor to another. During initialization it's used for the allToAll communication routines (which will be described below) but within cores it's almost exclusively used for halo exchanges.

-In this case, arrayIn and arrayOut are simple arrays representing 1d real

-values (there are other interfaces for integers, chars, and multi-dimensional

-arrays but they are all similar), nOwnedList and nNeededList are integers

-representing the sizes of arrayIn and arrayOut respectively, and sendList and

-recvList are exchange lists describing how the data from arrayIn needs to be

-communicated into arrayOut.

+It works by creating two lists of element id's (typically cells, edges, or vertices). One of these lists is owned elements while the other is needed elements. The needed list is communication round robin style to each processor until it comes back to the original processor. Each processor marks a needed element in the list as owned if it owns it. This way the processor that needs the element knows who it's supposed to receive the element from.

-The proposed new interface looks like

+During the round robin communications send lists are created as needed elements are marked as owned. After the round robin communications are finished receive lists are created. And after receive lists are created copy lists are created in a similar fashion to send lists.

-\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-subroutine mpas_dmpar_alltoall_field1d_integer(dminfo, 

-                                    fieldIn, fieldOut)

-\end{lstlisting}

+\subsection{allToAll routines}

+mpas\_comm provides a set of allToAll routines intended to distribute a field from one processor to one or more other processors. In this case, the sending information is supposed to include the 0 halo elements. In contrast to the exch\_halo routines which are intended to only communication 1+ halo elements.

-Where fieldIn and fieldOut are pointers to the fields that need to be

-communicated. In this case the exchange lists are stored within the field, or

-possibly the field \% block data structure. Both of the two fields represent a

-linked list of fields, where each of the fields in this linked list is the

-field for a given block. Each of the fields in the linked list also has it's

-own unique exchange lists.

-

-In order to handle allToAll communications, the following pseudo code is used

+A pseudocode version of the allToAll routines can be seen below.

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

 loop over fieldOut list

   loop over recvList for specific field

-    initiate mpi_irecv for specific recvList

+    create new communication list if needed

   end loop

 end loop

+allocate recvList buffers and initiate mpi_irecvs

+

 loop over fieldIn list

   loop over sendList for specific field

-    if sendList % procID == dminfo % my_proc_id

-      loop over fieldOut list

-        loop over copyList for specific field

-          if copyList % blockID == fieldIn_ptr % block % blockID

-            copy data from fieldIn_ptr to fieldOut_ptr

-          end if

-        end loop

-      end loop

-    else

-      pack data from fieldIn_ptr

-      initiate mpi_isend for specific sendList

-    end if

+    create new communication list if needed

   end loop

 end loop

-loop over fieldOut list

+allocate sendList buffers, copy data into buffer

+  initiate mpi_isends

+

+loop over fieldIn list

+  loop over copyList for specific field

+    loop over fieldOut list

+      if fieldOut % blockID == copyList % endPointID

+        use copy list to copy from fieldIn to fieldOut

+          end if

+        end loop

+  end loop

+end loop

+

+wait for mpi_irecvs

+  unpack buffer into fields

+

+wait for mpi_isends

+

+destroy buffers

+\end{lstlisting}

+

+\subsection{exch\_halo routines}

+mpas\_comm provides a set of halo exchange routines intended to distribute a set of 0 halo elements to other blocks that have them in their 1+ halo regions.

+

+A pseudocode version of the halo exchange routines can be seen below.

+\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

+loop over field list

   loop over recvList for specific field

-    wait for mpi_irecv to finish

-        unpack data into fieldOut_ptr

+    create new communication list if needed

   end loop

 end loop

-loop over fieldIn list

+allocate recvList buffers and initiate mpi_irecvs

+

+loop over field list

   loop over sendList for specific field

-    wait for mpi_isend to finish

+    create new communication list if needed

   end loop

 end loop

+

+allocate sendList buffers, copy data into buffer

+  initiate mpi_isends

+

+loop over field list

+  loop over copyList for specific field

+    loop over field list

+      if field % blockID == copyList % endPointID

+        use copy list to copy from fieldIn to fieldOut

+          end if

+        end loop

+  end loop

+end loop

+

+wait for mpi_irecvs

+  unpack buffer into fields

+

+wait for mpi_isends

+

+destroy buffers

 \end{lstlisting}

-The only changes within the mpas\_dmpar\_exch\_halo\_field are internal, and

-only refer to the addition of checking copyList for shared memory copies. These

-changes should be similar to the internal changes to the allToAll routine

-changes presented within the pseudocode above.

+\subsection{copy routines}

+mpas\_comm provides a set of routines intended to copy a field from the header block in a list of owned blocks to all other blocks in the list. These are needed during initialization to copy non-decomposed fields (fields that don't include the nCells, nEdges, or nVertices as a dimension) to all owned blocks.

-The whole structure of mpas\_dmpar\_get\_owner\_list has to change in order to

-support multiple blocks. This routine currently builds the exchange lists for a

-single block and has global communications. In order to handle the creation of

-exchange lists with multiple blocks, this routine will be rewritten to use the

-field data types. This way each field will be a linked list, consisting of the

-fields from each block. The downside to this change, is that now the routine

-requires the setup of basic block types (really just the blockID number), and

-block fields (like indexToCellID) prior to calling this. However, this keeps

-the data types used in the creation of these routines in line with how the rest

-of MPAS deals with fields.

+\subsection{Utility routines}

+In addition to communcation routines, mpas\_comm provides utility routines for the initialization and destruction of all derived data types discussed above.

-The previous interface for mpas\_dmpar\_get\_owner\_list can be seen below

+\section{mpas\_block\_creator.F}

+mpas\_block\_creator provides a new module for mpas that is used to create computational blocks. Given the information of a 0 halo, these routines create the 1+ halo regions for cells, edges, and vertices. They also initialize the list of local blocks.

+\subsection{Cell Routines}

+Routines to setup cell fields within a block are:

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-subroutine mpas_dmpar_get_owner_list(dminfo,

-                    nOwnedList, nNeededList,

-                      ownedList, neededList,

-               sendList, recvList, inOffset)

+mpas_block_creator_setup_blocks_and_0halo_cells

+mpas_block_creator_build_0halo_cell_fields

+mpas_block_creator_build_cell_halos

 \end{lstlisting}

-where in this case, ownedList and neededList are simple arrays representing

-indices of owned and needed elements, nOwnedList and nNeededList are the number

-of elements in each respective list, sendList and recvList are output fields to

-represent the send and receive lists for the communications between these

-fields, and inOffset is an offset for receive lists.

+These routines should be called in this order to properly setup the 0-1+ halo of cells. However, the build\_cell\_halos routine should not be called until all 0 halo fields are setup (including edges and vertices).

-The proposed new interface for this can be seen below

+\subsection{Edge/Vertex Routines}

+Routines used to setup edge/vertex fields within a block are:

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-subroutine mpas_get_exchange_lists(dminfo, ownedListField, 

-                         ownedDecomposed, neededListField, 

-                            neededDecomposed, offSetField)

+mpas_block_creator_build_0_and_1halo_edge_fields

+mpas_block_creator_build_edge_halos

 \end{lstlisting}

-where ownedListField and neededListField are pointers to linked lists of 1d

-integer fields, ownedDecomposed and neededDecomposed are logical flags

-determining if the *ListField is decomposed using mpas\_block\_decomp or if

-there is one block per processor from the field, and offSetField represents a

-pointer to a linked list of 0d integer fields determining each blocks receive

-list offset.

-The two major differences here are that the input data are given as fields

-rather than arrays, and the send/recv/copy lists are not output separate from

-the fields instead they are stored within the field structure. However these

-can be modified to put the send/recv/copy lists within the block rather than

-the field.

+As mentioned in the cell routine section, the 0-1 halos of edges and vertices should be setup prior to setting up the 1+ halos of cells.

-mpas\_input\_state\_for\_domain also has to be modified in order to setup the

-fields and blocks as required for mpas\_dmpar\_get\_owner\_list, and to create

-multiple blocks. Currently it is writing under the assumption that only one

-block per process exists, and has that single block hard-coded as the only one

-that gets created.

+\subsection{Utility Routines}

+Finally, two routines are provided to finalize the block initialization, and to re-index all fields in a block.

-In addition to the currently in place changes that need to be made, several

-routines need to be added. Within the current prototyping work, a routine has

-been created to determine the all cell indices for a block, including all halo

-cells. It has been written under the assumption that there could be an

-arbitrary number of halos. This routine is called

 \begin{lstlisting}[language=fortran,escapechar=@,frame=single]

-mpas_get_halo_cells_and_exchange_lists(dminfo, nHalos, 

-              indexToCellID_0Halo, nEdgesOnCell_0Halo, 

-              cellsOnCell_0Halo, indexToCellID_nHalos, 

-              nEdgesOnCell_nHalos, cellsOnCell_nHalos)

+mpas_block_creator_finalize_block_init

+mpas_block_creator_reindex_block_fields

 \end{lstlisting}

-In addition to these changes, the dimension variable nCellsSolve will now refer

-to an array. This array will contain the index to the end of a given halo. For

-example, if one wanted to do some computation over all 0 halo cells (ie. owned

-cells) the max index would be nCellsSolve(1), while computations to the end of

-the 1 halo would be nCellsSolve(2). In order to accommodate this, the

-indexToCellID array will also be packed appropriately, meaning the first

-nCellsSolve(1) indices will all be 0 halo cells while the next

-nCellsSolve(2)-nCellsSolve(1) cells will be the 1 halo cells and continuing

-until the max halo number is reached.

+These routines complete the block creation processor, and provide the rest of a model with a list of fully setup blocks to compute on.

-Because this routine relates specifically to cells, within the routine exchange

-lists are created for cells and stored within each block's cellsToSend,

-cellsToRecv, and cellsToCopy variables. This is done to keep in line with the

-previously identified requirement, and because the exchange lists that are

-actually used belong within this structure.

+\section{mpas\_io\_input changes}

+Within mpas\_io\_input the main mpas\_input\_state\_for\_domain routine has been trimmed significantly to make use of the new mpas\_block\_creator module the portions of the routine that were not able to be moved into the block\_creator module were moved to new routines within the io\_input module.

-One general change that has to be made in order to support these field data

-types being used in the input stage of MPAS is the addition of a deallocate

-field routine. This routine would be used to deallocate all fields within a

-field linked list. It is used when a field is created that's not a member of a

-block, so calling mpas\_deallocated\_block would not destroy all the memory

-associated with the field.

+This module also makes use of the new communication routines from mpas\_comm, as well as some routines in mpas\_block\_decomp.

-In addition to the changes listed here, routines still need to be determined to

-create a list of vertices and edges for a block and all it's halos, as well as

-their respective exchange lists. After the list of cells, vertices, and edges

-are complete for a block the IO read fields can be called to setup the fields

-within each block. Finally, the global indices within a block need to be

-modified to be local indices. 

+The new routines provided to clean up mpas\_input\_state\_for\_domain are:

+\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

+mpas_io_setup_cell_block_fields

+mpas_io_setup_edge_block_fields

+mpas_io_setup_vertex_block_fields

+\end{lstlisting}

+These routines are intended to read in contiguous chunks of data that can then be communication between processors.

-As mentioned in the requirements section, a large portion of these changes

-should be so they are isolated from the remainder of MPAS. In order to meet

-this requirement, a new module will be created named mpas\_block\_creator. 

+\section{mpas\_grid\_types changes}

+Within mpas\_grid\_types, utility routines are created to deallocate fields. This is used within mpas\_io\_input for fields that need to be linked similarly to block, but are not part of the block data structure.

-The creation of this new module should allow a reorganization of the input and

-output routines. It should also allow the code writing for the creation of

-blocks to be more transparent to other MPAS developers.

+In addition to utility routine additions, a pointer provis was created within the block\_type. provis is intended to be a scratch state. This can be used within time integration routines that require an additional time level, without having to modify the number of time levels within state. 

-One issue that comes up with the creation of this new module, is that MPI calls

-are now required within a module that's external to mpas\_dmpar. Previously all

-MPI calls were isolated within the dmpar module, however because of some

-circular dependency issues that come up now this is no longer the case. Now,

-MPI calls can be restricted to the dmpar module and the block creation module. 

+\section{Namelist changes}

+With the addition of multiple blocks in the framework of mpas, the decomposition namelist section will be made use of. This section provides the following namelist options.

+\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

+config_block_decomp_file_prefix

+config_number_of_blocks

+config_explicit_proc_decomp

+config_proc_decomp_file_prefix

+\end{lstlisting}

+The prefix options are used to specify the prefixes on decomposition graph files used to create blocks, and determine which blocks are owned by which processors, config\_number\_of\_blocks determines how many blocks the simulation is supposed to be run with, and config\_explicit\_proc\_decomp is a logical flag which determines if mpas should look for a graph file describing how blocks should be distributed between processor, or if it should round robin assign blocks to processors.

+

+In addition to the decomposition section changes, the model sections (like sw\_model) now has a new namelist option:

+

+\begin{lstlisting}[language=fortran,escapechar=@,frame=single]

+config_num_halos

+\end{lstlisting}

+

+This namelist flag determines how many halo layers each block should have on cells. The halo layers for edges and vertices are nHaloLayerCells+1, or config\_num\_halos + 1.

+

 \chapter{Testing}

 **NOTE**

 All of the testing described in this section relates only to the ocean core.

 Other core developers may test this with similar procedures but different

 simulations. \\

-

 The end goal from this project is to provide a framework that allows

 bit-for-bit reproduction of data using an arbitrary combination of blocks and

 processor numbers.

@@ -449,7 +407,8 @@

         \item Finished branch simulation run with 2 processors and 8 blocks (4 blocks per proc).

 \end{itemize}

-If all of these simulations produce bit-for-bit output then testing can move on to a set of larger scale simulations.

+If all of these simulations produce bit-for-bit output then testing can move on

+to a set of larger scale simulations.

 \begin{itemize}

         \item Current trunk 15km simulation with 1200 processors and 1200 blocks (1 block per proc).

@@ -458,6 +417,58 @@

         \item Finished branch simulation with 24 processors and 1200 blocks (50 blocks per proc).

 \end{itemize}

-After these final four simulations show bit-for-bit output then the project can be deemed as completed.

+After these final four simulations show bit-for-bit output then the project can

+be deemed as completed.

+\chapter{Appendix - Use of exchange/communication lists}

+This chapter will describe the use of exchange and communication lists.

+

+To begin, exchange lists have two uses. First copyLists will be described,

+followed by send/recvLists.

+

+copyLists are attached to the block sending the data, and there is no matching

+list on the receiving end. Within the copyList, the endPointID variable

+represents a blockID giving the local id to the block on the other end of the

+communication, nList is the total number of elements that need to be copied,

+srcList is the list of indices to take this data from out of the owning field's

+array, and destList is the list of indices to put this data into the needing

+field's array. 

+

+In using copyLists, first a search over blocks needs to be preformed

+to find the matching block for the communication. After that the elements

+listed in srcList are copied into the elements listed in destList. After which,

+the shared memory copy is complete.

+

+The second use case relates to send/recvLists. sendLists are attached to

+sending blocks, while recvLists are attached to receiving blocks. Within both

+of these, endPointID refers to a processor id, and nList refers to the number

+of elements a specific block should expect to communicate. In a sendList,

+srcList describes the indices to pull data out from the owning field's array

+while destList descibes the indices to put that data into a communication

+list's buffer. In a recvList, srcList describes the indices to pull the data

+out of the buffer, while destList describes the indices to put that data into

+the needing field's array.

+

+In order to use these, each side of the communication does something different.

+Before describing the use of send and recv lists, communication lists need to

+be explained.

+

+A communication list describes aggregated communications between processors.

+These provide an easy to use framework to allow communications to occur

+processor by processor as opposed to block by block. Since communication lists

+only relate to MPI communications, the only ID within the type is procID. nList

+refers to the number of elements in the buffer, while rbuffer and ibuffer

+provide deallocated arrays to put reals or integers in the buffer, and reqID

+provides a variable to store MPI communication ID's to use when calling

+MPI\_Wait.

+

+In order to perform a send an receive, first a processor needs to build the

+buffers, or communication lists. To begin the fields relating to the specific

+communication are looped over, and the total number of elements to each

+processor are stored in order to build a communication list for that processor.

+After this step, the communication list buffers are allocated relating to their

+nList varibles. Buffers need to be created both on the sending and receiving

+side. After the buffers are created, the sending field copies all of it's data

+into the array. The buffer is then sent, and on the receiving end the receiving

+field unpacks all of the data into it's arrays.

 \end{document}

</font>

</pre>