[mpas-developers] DESIGN DOCUMENTS: Field Statistics, Run-time I/O, Auto-Documentation

Doug Jacobsen jacobsen.douglas at gmail.com
Thu Feb 28 19:45:43 MST 2013


Hey Michael,

Thanks for the comments.

So, I've played a little bit with an XML parsing library for C, which is
very lightweight and easy to use. The library only includes 2 files
(ezxml.c and ezxml.h), and they are in a license that lets us include them
in our project. My idea for that was to simply add them into the Registry
directory, and have them build when Registry builds. I do agree with the
rest of 1) though, that if we don't use a standard and
easily parse-able format we will have to write our own parser, similar to
what is in Registry currently. Another potential issue, is I think very few
people actually know how Registry works, and are comfortable enough
modifying it if they find a bug, or even comfortable enough to debug it.
Making use of a standard parser could make it somewhat easier to understand.

I'm not sure if the ezxml library I've been using is easily compileable on
BlueGene/Q's, but I have a small sample program I can send you to test if
you would like. There are a large number of XML parsing libraries for C for
us to choose from, and most seem to be under licenses that allow us to
include them in MPAS.

As far as 2), my idea was to write a script to parse Registry, and reformat
it into whatever format we end up using. I'm pretty sure this would be easy
to do, and I have been thinking I would do this as part of the project
anyway.

I agree that the editors and CESM are rather unconvincing, they are just
smaller things that XML has in favor of it over JSON. But the primary
reasons I'm in favor of XML are that it's a standard format, we can
validate it (for error checking), and core developers can easily augment
the data in their Registry file as they require. Also, there are an
abundance of parsers that exist for these standard formats so, we can
leverage work others have done when generating our documentation and the
Fortran code for use in MPAS.

Doug


On Thu, Feb 28, 2013 at 7:17 PM, Michael Duda <duda at ucar.edu> wrote:

> Hi, Doug.
>
> Thanks for the clarification. I'm less enthusiastic about JSON,
> especially since it apparently doesn't support comments, but I'm still
> unsure whether XML is the best tool for the job.
>
> Here are a couple of questions that I'm wondering about.
>
> 1) If we don't use an XML parsing library in our registry code, we'll
> need to write our own parser; otherwise, that would add an extra library
> dependency in MPAS. If people are having problems compiling PIO, would
> they also have problems compiling an XML library? Is it easy to get such
> a library on, e.g., a BlueGene/Q?
>
> 2) We currently have just over 400 variables in the MPAS-A registry.
> How will we convert these to XML format? This is actually an issue for
> any significantly different registry format, not just XML, and I'd guess
> that the solution would be to write a program to convert registry
> formats. Could we then just provide users with this tool for converting
> our own format to XML?
>
> I'm not (yet?) convinced by the arguments that we should use XML because
> CESM does and because there are many XML editors that work on Linux. In
> my mind, the primary argument in favor of XML is that employing a widely
> supported format would make it easier for users to utilize the
> information in the registry for purposes that we've not yet envisioned.
>
> Michael
>
>
> On Thu, Feb 28, 2013 at 06:18:47PM -0700, Doug Jacobsen wrote:
> > Hey Michael,
> >
> > So, the main request from my design document is somehow augmenting the
> > information in Registry with additional information that the actual
> > Registry parser wouldn't use. The main use for this extra information is
> > allowing Registry defined attributes that can be  written into
> > fields/streams to help with CF Compliance. The additional use is that we
> > can parse this extra information to help generate our users guides.
> >
> > But about your questions....
> >
> > I am suggesting the use of either XML or JSON, mostly because they are
> > standard pre-defined formats that are already defined. XML at least is
> > already in use by CESM, so it seems like CESM users would not have a
> > difficult time editing Registry (if they wanted to).
> >
> > These formats also allow cores to append arbitrary information to
> Registry
> > depending on what their requirements are, and allows us to specify a
> Schema
> > that can be used to define the required attributes in Registry, and then
> > verify the validity of the Registry file.
> >
> > So, I think that the answers are yes to 1), but no to 2), and regardless
> of
> > what we do end up having as our format we can write a parser. Though if
> we
> > do use a standard format, we can make use of pre-existing parsers.
> >
> > I think largely the format won't affect developers much, except when they
> > have to edit Registry. The fortran code that gets generated should be
> > exactly the same as it is now.
> >
> > Let me know if you have any other questions or concerns.
> >
> > Thanks,
> > Doug
> > On Feb 28, 2013 5:42 PM, "Michael Duda" <duda at ucar.edu> wrote:
> >
> > > Hi, Doug (and others).
> > >
> > > Just to check my understanding of the issues, are the arguments for
> > > moving to either XML or JSON that
> > >
> > > 1) because these are widely supported formats, there would be tools
> > > available to automatically generate documentation from our registry
> > > files if they were to be in one of these formats; and
> > >
> > > 2) if we were to employ an XML or JSON parsing library in the registry
> > > code, we could easily parse the registry file into data structures,
> from
> > > which we could then generate Fortran code as we do now?
> > >
> > > Are there other arguments in favor of either of these formats?
> > >
> > > Michael
> > >
> > >
> > > On Thu, Feb 28, 2013 at 03:29:27PM -0700, Doug Jacobsen wrote:
> > > > Hi again everyone,
> > > >
> > > > To try and help fuel the discussion about the format of Registry,
> I've
> > > > decided to try and mock up two versions of Registry. There is the
> current
> > > > proposal in the design document for XML, but I slightly modified it
> in a
> > > > way that makes it more closely mirror whats currently in the code.
> > > Another
> > > > developer here recommended I look at JSON as well, so I put together
> an
> > > > example of the same XML Registry, but in the JSON format.
> > > >
> > > > These are not complete versions of what they would look like in the
> end,
> > > > but I tried to give examples of at least everything I wanted to have
> in
> > > the
> > > > file in the end. Please take a bit and download the two files. You
> can
> > > open
> > > > them up in your favorite editor and look around to see if you like or
> > > > dislike either of them.
> > > >
> > > > One thing to note, I'm not super familiar with JSON so some of the
> > > > formatting might be incorrect in the version I sent you. So there
> might
> > > be
> > > > some small errors, but I think largely it's correct.
> > > >
> > > > Some small notes:
> > > > One fairly large benefit to using either of these two formats is
> that we
> > > > can define a JSON or XML schema and do validation checks on our
> Registry
> > > > file prior to using it.
> > > > One fairly large negative to the JSON format is that you can't write
> > > > comments. The only want to write comments is to define an unused
> > > key:value
> > > > pair that is your comment.
> > > >
> > > > Again, any questions or comments are appreciated.
> > > >
> > > > Thanks,
> > > > Doug
> > > >
> > > >
> > > > On Tue, Feb 26, 2013 at 10:43 AM, Doug Jacobsen
> > > > <jacobsen.douglas at gmail.com>wrote:
> > > >
> > > > > Hello Everyone,
> > > > >
> > > > > There are two design documents attached (and also recently
> committed to
> > > > > the repo). I put two in this email because on of them requires the
> > > other,
> > > > > and provides something to consider in the first.
> > > > >
> > > > > First, is the run-time I/O document that also includes
> > > auto-documentation.
> > > > > This requires some rather significant modifications to Registry,
> with
> > > the
> > > > > end goal being a verbose format for Registry that allows
> documentation
> > > of
> > > > > fields, namelists, and dimensions to be written into the Registry
> > > file. I
> > > > > have provided an XML proposal for Registry conversion in this
> > > document, but
> > > > > this could be a different format. This format will also be used to
> > > enforce
> > > > > CF compliance in our output files (writing out field level
> attributes
> > > and
> > > > > what-not).
> > > > >
> > > > > This document also includes description (although rough currently)
> of
> > > an
> > > > > auto-documentation parser script. Currently I have a script
> written in
> > > > > python that parses Registry and an additional documentation file
> (as a
> > > > > test) that generates tables and sections that will be included in
> our
> > > users
> > > > > guide.
> > > > >
> > > > > One of the main short term benefits of this project is that it
> allows
> > > ease
> > > > > of documentation. However the second benefit is the creation of a
> > > run-time
> > > > > I/O layer. This allows the creation of streams at compile time,
> and the
> > > > > modification of what fields are in each stream at run-time. This
> makes
> > > use
> > > > > of another namelist file (described in the document) to make
> > > configuration
> > > > > easy.
> > > > >
> > > > > Second, is the field statistics module design document. This
> provides a
> > > > > description of a generic module that can be used to compute time
> > > averages
> > > > > and field moments. Time averages and moments of fields can be
> > > specified at
> > > > > run-time, with the implementation of the run-time I/O layer.
> > > > >
> > > > > Both of these projects are rather large, and some of our
> documentation
> > > > > requires the first project. My hope is to begin to work on this
> project
> > > > > within the next few weeks so I would like to get at least the first
> > > > > document solidified sooner rather than later.
> > > > >
> > > > > So, please let me know if you have any questions of comments.
> > > Especially
> > > > > regarding the format of Registry.
> > > > >
> > > > > Thanks!
> > > > > Doug
> > > > >
> > >
> > >
> > >
> > > > _______________________________________________
> > > > mpas-developers mailing list
> > > > mpas-developers at mailman.ucar.edu
> > > > http://mailman.ucar.edu/mailman/listinfo/mpas-developers
> > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/mpas-developers/attachments/20130228/889fd0e6/attachment.html 


More information about the mpas-developers mailing list