[Go-essp-tech] CMOR and cell_measures issues

Karl Taylor taylor13 at llnl.gov
Mon Nov 1 16:12:36 MDT 2010


Couldn't we simply say that we certify that CMIP5 data conforms to the 
CF 1.4 standard except that the cell_measures variables may  be found in 
an external file, rather than the referencing file.  That way the data 
will pass the CMIP5 QC checks which  don't include requiring the 
cell_measures variables to be found in the referencing file.   I think 
the decision between cell_measures and ext_cell_measures should be based 
on which one will be most useful to the users.  In CMIP5, users should 
be able to find the cell areas even without cell_measures, so I'm not 
sure this decision is all that critical.

regards,
Karl



On 11/1/10 2:30 PM, martin.juckes at stfc.ac.uk wrote:
>
> Hello All,
>
> Sorry to be repetitive, but I want to repeat a question I raised 
> earlier today (Monday in the UK) and hasn’t been answered yet: will 
> the proposed change to the CF checker be matched to a change to the 
> conformance document so that the CF 1.4 conformance no longer demands 
> that variables named in cell_measures be in the same file?
>
> I’ve also copied Bryan and Michael in again, so to get a quality 
> control perspective – as it worries me that an agreement made in a 
> rush might not meet the expectations of the quality control we have 
> committed to,
>
> Regards,
>
> Martin
>
> *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
> *Sent:* 01 November 2010 18:19
> *To:* Kettleborough, Jamie
> *Cc:* Bentley, Philip; V. Balaji; Juckes, Martin (STFC,RAL,SSTD); 
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov; 
> Doutriaux, Charles
> *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
> Hi Jamie,
>
> I'm arguing that given that cell_measures (or ext_cell_measures) will 
> *not* appear in files containing fields most likely to be carried on a 
> mesh different  from the "primary" mesh (because I've removed those 
> from the requested output table, and hence the CMOR tables), I think 
> it is better to *assume* the remaining variables are on the "primary" 
> mesh.  I would be surprise if more than 1% of the variables written 
> will have cell_measures pointing to an incorrect area field.  If it 
> does, I assume the area variable will have different latxlon 
> dimensions than the variable itself, so it will be difficult for a 
> user to mistakenly apply the areas.
>
> So rather than advocate completeness over correctness, I'd say I'm 
> advocating "almost perfect" versus "perfect".
>
> If the number of offending cases is much larger than I'm imagining, 
> please let me know.
>
> Best regards,
> Karl
>
> On 11/1/10 10:09 AM, Kettleborough, Jamie wrote:
>
> Hello Karl,
>
> thanks for this reply.  Putting aside the issue of whether this is 
> really ext_cell_measures or cell_measures then I think, given the 
> resources we have locally, we have to make a choice of correctness vs 
> completeness.  The reason we are tempted to turn off ext_cell_measures 
> is it is the least effort way we can see of submitting data that is 
> correct.  I think you are suggesting going for completness - even if 
> we risk submitting some data with ext_cell_measures that is 
> incorrect.  Obviously this is *my* interpretation of what you are 
> saying.  Yes we can go for both correctness and completeness, but this 
> will take us some effort - we need an exta component in our system 
> that can recognise which cell areas to assign to which variables (with 
> minimum error) - and we (like everyone) have lots of demands on our 
> effort at the moment - and we have to make judgements about where to 
> prioritise.  (This isn't supposed to be a sob story - just trying to 
> explain why we are tempted...)
>
> Would you recommend 'completeness' over 'correctness' - have I 
> interpreted you correctly?  What are the options for correcting 
> incorrect meta-data once data is ingested into ESG?
>
> Jamie
>
>     ------------------------------------------------------------------------
>
>     *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
>     *Sent:* 29 October 2010 21:36
>     *To:* Kettleborough, Jamie
>     *Cc:* Bentley, Philip; V. Balaji; martin.juckes at stfc.ac.uk
>     <mailto:martin.juckes at stfc.ac.uk>; go-essp-tech at ucar.edu
>     <mailto:go-essp-tech at ucar.edu>; cmor at lists.llnl.gov
>     <mailto:cmor at lists.llnl.gov>; Kyle.Olivo at noaa.gov
>     <mailto:Kyle.Olivo at noaa.gov>; Doutriaux, Charles
>     *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
>     Dear Jamie and Charles (a couple of questions for you),
>
>     Hello Karl,
>
>     I think the recommended way to 'turn off' ext_cell_measures is to
>     make a call to cmor.set_variable_attribute(varid,
>     'ext_cell_measures', '').  Is that right?  We are very tempted to
>     do this for all variables - so basically overriding the MIP
>     tables.  How big a problem do you think this will be for data
>     users - our grid is pretty straight forward and users can
>     calculate cell_areas from the latitudes.
>
>
>     Yes, if the cell areas stored in areacella are not appropriate for
>     a particular field, and the requested output tables say that
>     ext_cell_measure includes areacella, then you should call the set
>     attribute function to reset ext_cell_measures="".  Isn't that
>     right Charles?
>
>     Why are you tempted to turn off the ext_cell_measures for all
>     variables?  Then your output won't conform to the CMIP5 requirements.
>
>     In the latest CMOR tables, I have removed ext_cell_measures from
>     all the variables that we don't expect always to be on the
>     standard mesh (i.e., on the grid for which areacella is correct). 
>     This includes velocities and transports and closely related
>     fields, which are sometimes staggered relative to areacella.  I
>     would still be interested in hearing a clear explanation for why
>     there are additional fields carried on a completely different grid.
>
>     If users must compute the cell areas for only your grid, and for
>     all others they simply read the areacella field in, then you are
>     creating a special case that is completely unnecessary.
>
>     That aside, doesn't the approach of providing alternative grid
>     areas need more discussion?
>
>       1. how should we produce these.  The most natural approach I can
>     think of is to modify the fx MIP tables to add in areacellb (or
>     whatever we choose to call it) and then output through CMOR - this
>     will maximise the chance of consistency between different grid
>     area files for any one model.
>
>       2. how should we reference these additional areas from a
>     variable.? I could call cmor.set_variable_attribute(varid,
>     'ext_cell_measures', 'areacellb') - but in the tests I've done on
>     CMOR 2.4 this only does half the job: it puts the appropriate
>     ext_call_measures attribute into the file, but does nothing with
>     associatedFiles.
>
>     I don't think it is a high priority to standardize this
>     immediately.  We will want CMOR to place the fields in the
>     subdirectory fx, so I need to check with Charles whether this
>     requires the variable to appear in table fx.  If not, I would
>     probably build an entirely new table similar to fx, but with only
>     the additional variables.  This way you won't have to modify your
>     table if a new fx table comes out.  As for referencing these
>     additional area variables, I think if you include area:
>     <area_name> in the ext_cell_measures attribute, then if CMOR isn't
>     already doing this, Charles can modify construction of
>     associated_files to include something following the template
>     "<area_name>: <area_name>_fx_IPSL-CM5_historical_r0i0p0.nc"   What
>     do you think, Charles?
>
>     Clearly these may have been things you were going to cover - but
>     ran out of time to, in which case sorry.
>
>     I think another scenario that still needs some thought is one
>     where a data provider has submitted data and published it in ESG. 
>     They then realise they made a mistake - they should have turned
>     ext_cell_measures off, but didn't (or visa-versa). What happens in
>     this case?  (We have kind of done this in that we have send data
>     with incorrect cell_measures to the BADC - but have caught the
>     issue before ingestion into ESG  - I don't believe we will always
>     be this lucky).   You'll probably see through why I'm asking this
>     question about meta-data updates again now, so I may as well be
>     explicit... If we choose to turn off ext_cell_measures for all our
>     diagnostics on this initial submission - what are our options for
>     recovering from this if we later found the decision to submit
>     without ext_cell_measures was making our data hard to use?
>
>
>     Please don't turn off ext_cell_measures (in general).   I think
>     you could easily write a script to remove the cell_measures
>     attribute using netCDF tools, but adding it would require
>     rewriting the entire file.
>
>     Best regards,
>     Karl
>
>     Jamie
>
>         ------------------------------------------------------------------------
>
>         *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
>         *Sent:* 29 October 2010 02:15
>         *To:* Bentley, Philip
>         *Cc:* V. Balaji; martin.juckes at stfc.ac.uk
>         <mailto:martin.juckes at stfc.ac.uk>; go-essp-tech at ucar.edu
>         <mailto:go-essp-tech at ucar.edu>; cmor at lists.llnl.gov
>         <mailto:cmor at lists.llnl.gov>; Kyle.Olivo at noaa.gov
>         <mailto:Kyle.Olivo at noaa.gov>; Doutriaux, Charles;
>         Kettleborough, Jamie
>         *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
>         Dear all,
>
>         I meant to try to address all the stuff in this discussion,
>         but won't have time today.  This email is just to say that I
>         think we should insist that the cell_area files (areacella and
>         areacello) be placed in the archive, even if there are also
>         gridspec files.   The ext_cell_measures attribute should also
>         be included for fields that are on the "standard" grid (i.e.,
>         the one with the cell areas stored in areacella or
>         areacello).  If there are other fields for which the standard
>         areas are inappropriate and where your scientists think it is
>         important to provide cell areas, then I recommend that you
>         create specially named variables and place them in the "fx"
>         subdirectories.   For variables not on the "standard" grid
>         (i.e., the grid of areacella or areacello), you should "turn
>         off" the ext_cell_measures attribute.
>
>         I don't expect most groups to produce gridspec files, so most
>         analysts will be looking for areas in the areacella and
>         areacello variables, not the gridspec files.  This is why you
>         should write the areacella and areacello files even if you
>         also write the gridspec files.
>
>         Also, could you please explain why you prefer not to duplicate
>         the "fx" fields in each experiment's directory tree.
>
>         Best regards,
>         Karl
>
>         On 10/25/10 7:12 AM, Bentley, Philip wrote:
>
>         Hi Balaji,
>
>           
>
>             Phil, I'm very impressed that Had will have gridspec files,
>
>             is this a done deal? I've been so pessimistic about this that
>
>             I was wondering if even we should do one ourselves.
>
>         Nope, not a done deal yet :-(
>
>           
>
>         In line with the CMIP5 expt design doc, we don't really need to provide
>
>         gridspec files since all our model output is on either regular or
>
>         uniform grids (i.e. simple cartesian product of lat&  long).
>
>           
>
>         However, this whole cell_measures business prompted me to revisit the
>
>         gridspec tools and output, which reminded me that the gridspec netcdf
>
>         files include a cell area variable. Which in turn means we wouldn't need
>
>         to provide a separate file (or files) for cell areas. Hence we could
>
>         drop the ext_cell_measures attribute from our CMIP5 output files.
>
>           
>
>         Using the gridspec tools may be a quick and easy way for us to provide
>
>         cell area info if we need to.
>
>           
>
>         Caveat: from a quick glance it looks like the netcdf files produced by
>
>         the gridspec tools are not CF compliant. Is this is an issue? Presumably
>
>         it is if we want all the data in the CMIP5 archive to be CF compliant.
>
>         (NB: it could be I'm not running with the very latest version of the
>
>         tools - but I couldn't see a more recent version on the gfdl web site).
>
>           
>
>             You know of course that gridspec says you can supply
>
>               
>
>                 gridspec_fx_HadGEM2-ES_atm_pgrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_ugrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_vgrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
>
>             as one single supergrid...
>
>         If I could figure out how to output all 7 or 8 atm/ocn (sub-)grids to a
>
>         single netcdf file I would, but the available documentation (e.g. for
>
>         make_hgrid) isn't clear on this point. Sorry, that's probably just me
>
>         being dumb! But if there is updated documentation then please point me
>
>         to it. If necessary I could concatenate variables afterwards using NCO
>
>         tools.
>
>           
>
>         Right now I'm trying to figure out how to create a gridspec file for our
>
>         HadGEM2 ocean model, which uses a stretched (i.e. tartan/plaid) grid:
>
>         longitudes are evenly spaced, latitudes vary from 1 deg to 1/3 deg.
>
>         (Looks like I need to use the --my_grid_file option to supply the
>
>         lat/long coords).
>
>             But if you're doing gridspec at all, I will concede this
>
>             point:-). Let's both do these separate gridspecs for now.
>
>         Works for me.
>
>           
>
>         I think we're suffering from 'early-adopter syndrome' :-/
>
>           
>
>         Cheers,
>
>         Phil
>
>           
>
>             Bentley, Philip writes:
>
>               
>
>                 Hi Karl,
>
>                   
>
>                 A somewhat belated follow-up question in connection with
>
>             this proposal
>
>                 (and with some slight overlap with Jamie's email which
>
>             crossed on the
>
>                 ether)...
>
>                   
>
>                 As things stand the files named in the 'associated_files' attribute
>
>                 appear thus (using our RCP 4.5 simulation as an example):
>
>                   
>
>                 "... gridspecFile: gridspec_fx_HadGEM2-ES_rcp45_r0i0p0.nc areacella:
>
>                 areacella_fx_HadGEM2-ES_rcp45_r0i0p0.nc"
>
>                   
>
>                 Are the<expt_id>_<rip>  parts (i.e.  'rcp45_r0i0p0.nc' ) actually
>
>                 required? AFAIK, our gridspec/cellarea files will not
>
>             change from one
>
>                 simulation to the next using the same model (HadGEM2-ES in
>
>             this case).
>
>                 Since, like most centers, we will be running large numbers of
>
>                 simulations using the same model, it looks like we would need to
>
>                 create numerous duplicates of the gridspec/cellarea files -
>
>             or lots of
>
>                 symlinks
>
>                 - in order to for these references to make sense. Unless you are
>
>                 planning to manage that on our behalf somehow...?
>
>                   
>
>                 I think our 4 gridspec files for the HadGEM2 atm grids are
>
>             likely to
>
>                 be called something like...
>
>                   
>
>                 gridspec_fx_HadGEM2-ES_atm_pgrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_ugrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_vgrid.nc
>
>                 gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
>
>                   
>
>                 So without any simulation-specific info. (There would also be files
>
>                 for the ocean grids)
>
>                   
>
>                 As it happens the gridspec files contain grid cell areas,
>
>             so I'm now
>
>                 wondering if we'd even supply both?
>
>                   
>
>                 I'd be interested to hear your thoughts on this. I may be
>
>                 mis-understanding something/everything :-)
>
>                   
>
>                 Regards
>
>                 Phil
>
>           
>
>
> -- 
> Scanned by iCritical.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101101/93c38634/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list