[Go-essp-tech] [Ncpp_tech] Close to final proposal for global attributes for downscaled datasets

Fri Apr 5 14:04:04 MDT 2013

Hi all, 

I've been asked only recently to join the NCPP discussion, so my apologies if
any of this has already been covered.

I've had a look at the proposed global attribute table for CORDEX, and based on
our experience with NARCCAP, I have two recommendations to make:

1) Add a "version" attribute.

There will be errors in output production, and data producers will end up
creating multiple versions of the same model output files.  The tracking_id
attribute would cover this issue to some extent, but a version number can carry
more useful information.  If versions allow for major and minor versions, where
the minor version is incremented when metadata or ancillary data is corrected
and the major version is incremented when the data in the main data variable
changes, then it's easy to compare two files and see not just whether they're
different, but whether those differences matter.

For example, suppose I have u-wind and v-wind, and I want to calculate wind
speed.  If u-wind is version 1.2 and v-wind is version 1.4 (say because some
typos were corrected in the metadata), I know that the data in both files
matches, and that if I'm going to copy metadata from an input file, I should
use the v-wind file, not the u-wind file. But if u-wind is version 2.0 (say
because a month of bad data was filled in between v1 and v2), then I know I
probably need to check whether v-wind has an update as well.

It's also important to document the differences between versions, but I'm not
sure that file metadata is the right place to do that.  I'm also not sure
whether version should replace tracking_id or supplement it.  I lean towards
replace, just because tracking_id is likely to not be updated every time it
ought to be, and version will be more stable against those kinds of changes,
but I don't know what all uses are planned for tracking_id.

2) Don't use capital letters in the attribute names.

The difference between "Experiment_ID", "experiment_ID" and "experiment_id"
will become completely invisible to your eye when you're looking at the file
headers trying to figure out why the checker says you don't have a valid
experiment_ID attribute when you can plainly see that experiment_id is set to
"historical", and it will cause a lot of avoidable frustration and headache.

At the very least, we should make capitalization follow a consistent pattern
from attribute to attribute, but I favor no capitalization at all because
that's the pattern that's easiest to explain and remember and therefore will be
applied most consistently.

Otherwise, I think it all looks pretty good.

Cheers,

--Seth

On Thu, 04 Apr 2013 18:06:56 -0600
 galina <galina at rap.ucar.edu> wrote:
>Dear all,
>
>After our teleconference on 03/20 and the email exchange, suggestions and
>questions posed by Karl Taylor, Laura Carriere and Martin Juckes, there was a
>need to prepare a final proposal for the set of global attributes to be used
>when publishing statistically downscaled data in ESGF.
>
>In order to standardize the elements of the DRS for publishing downscaled
>datasets Joe Barsugli and Galia Guentchev evaluated the global attributes used
>by CORDEX, the additions/corrections proposed by Karl Taylor and the initially
>proposed global attributes for publishing of downscaled datasets. We prepared
>a divergence table that summarizes all of these details. The table is attached
>to this email.  The last column contains a Final proposal of the global
>attributes for consideration. At the bottom of this table we also added some
>NEW global attributes that we consider important for inclusion.
>
>The differences from the proposal we discussed and agreed on during the last
>teleconference (03/20) stem from the desire for standardization and
>consistency as much as possible with CORDEX and to an extent with the CMIP5
>DRS requirements.
>
>The main difference is in the approach to use the standard global attributes
>such as experiment_ID, model_ID when describing the downscaling
>characteristics (experiment, and statistical or dynamical downscaling model);
>the main reasoning for using model_ID being that users would look for all data
>most often under model_ID and this is where we would like for them to be able
>to find  also all of the downscaled data listed. To describe the global
>attributes pertaining to the global model that was used as a predictor or a
>driving model (in dynamical downscaling) we are proposing to use the
>descriptor "driving" to distinguish these specific global attributes.
>
>Also, instead of having a separate sub-project, we propose that this
>information is included in the experiment_ID (which in this proposal is
>intended to describe the downscaling experiment and the downscaling setting);
>We also include "perfectModel" descriptor in the "product" global attribute to
>distinguish the applicability of the downscaled data for impact applications;
>Although, we already received a comment by Aparna that indicated a concern
>regarding this last proposed addition.
>
>Please take a look at the table and send any comments or suggestions that you
>might have. We would like to expedite the decision on the list of final global
>attributes, so that the NASA team would be able to publish their data soon.
>We are hoping to reach an agreement on the final set of global attributes via
>email within the next few days.
>
>
>Best regards,
>Galia