[mpas-developers] MPAS I/O requirements and design doc

Jones, Philip W pwjones at lanl.gov
Tue Feb 28 09:51:44 MST 2012


Michael,

Finally read your I/O doc and have a several comments/suggestions.

For requirements, I would add:
   - requirement for exact restart (any simulation interrupted by a restart must be bitwise identical to a simulation without restart)
   - for CESM, netCDF CF conventions are required for output files (mostly certain required metadata – don’t think we need to be bound by CF grid conventions yet since they are a bit onerous, especially for unstructured grids)
   - requirement to be able to specify the number of I/O tasks; this will be important to optimize the I/O layer with underlying architecture
   - requirement to write many different types of fields, including scalars and various multi-dimensional arrays (eg meridional diagnostics, transports, surface fields), some of which we are not likely to want in the registry (temporary derived diagnostic fields).  Note that even the scalars count as a field since they may have associated metadata (eg transport diagnostics with the name/units/possible location info for the transport).

For global attributes, you will need a more complete stream attribute layer with the stream having an arbitrary number of attributes.  Some of these may be standard (eg conventions, history).  You will not only need to read/write but also add/remove attributes to a stream.

Similarly, you will need a more robust attribute layer for fields – you can either do this within the Field data type or define a new IOField data type that includes arrays of attributes together with the field.  Basically, you will need to allow the user to define an arbitrary number of attributes of any kind (string, int, real, double, logical) for a given field, including standard attributes like short name, long name, units, valid range, undefined, etc.  Having this as part of the field layer can be beneficial for defining many attributes only once near start-up or reusing the same field in many different streams.

You might also want to separate the dimension layer (you already sort of have this in the io_info within Field) so you can separately define dimensions and attach them to fields.  CESM will eventually want an unlimited (time) dimension too, but we can worry about that later.

For netCDF (or other self-defining formats), you will need multiple phases, especially for writing:
  - creating the fields and adding all attributes
  - “defining” the fields and dimensions (netCDF has a separate define phase that writes all attributes and prepares for the binary portion and it’s very inefficient to jump in and out of the define phase)
  - writing the field (generally writing the actual data since most metadata is written during the define phase)

For performance/memory reasons, you don’t want to have to gather/copy data during the first two phases since you’ll need to define all fields up front.  So the field should probably use a pointer with which you can point to the actual binary data just before reading/writing.

Are we going to support straight binary format?  If so, we’ll need a design for how we store metadata in those files.  Note that I don’t think this is a strong need any more (netCDF can not always guarantee exact restart if an architecture isn’t using a form of IEEE binary format, but that doesn’t happen very often anymore).

Basically, the current design is not complete enough, esp. wrt attributes, and should probably be fleshed out some more.  We can always prioritize aspects of the implementation so we can at least get multi-block I/O for registry variables, etc. up quickly.  But we’ll need all of this before too long.

Sorry to add more work, but it’s worth thinking about this stuff now.

Phil


On 2/24/12 2:08 PM, "Michael Duda" <duda at ucar.edu> wrote:

Hi, Folks.

I've been slowly working on a requirements and design document for
a new I/O layer in MPAS that will provide parallel I/O (almost
certainly to be implemented using PIO) and I/O for multiple blocks
per MPI task. The Implementation and Testing chapters are still
blank, as I first wanted to get some feedback on the requirements
and proposed design to see whether I'm headed in the right
direction.

Attached is the document and its source; if anyone has questions,
comments, or other suggestions, I'd be glad to hear them.

Thanks!
Michael
________________________________
_______________________________________________
mpas-developers mailing list
mpas-developers at mailman.ucar.edu
http://mailman.ucar.edu/mailman/listinfo/mpas-developers


---
Correspondence/TSPA/DUSA EARTH
------------------------------------------------------------
Philip Jones                                pwjones at lanl.gov
Climate, Ocean and Sea Ice Modeling
Los Alamos National Laboratory
T-3 MS B216                                 Ph: 505-500-2699
PO Box 1663                                Fax: 505-665-5926
Los Alamos, NM 87545-1663



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20120228/dbfad91f/attachment.html 


More information about the mpas-developers mailing list