[Go-essp-tech] CMOR and cell_measures issues
Karl Taylor
taylor13 at llnl.gov
Mon Nov 1 16:12:36 MDT 2010
Couldn't we simply say that we certify that CMIP5 data conforms to the
CF 1.4 standard except that the cell_measures variables may be found in
an external file, rather than the referencing file. That way the data
will pass the CMIP5 QC checks which don't include requiring the
cell_measures variables to be found in the referencing file. I think
the decision between cell_measures and ext_cell_measures should be based
on which one will be most useful to the users. In CMIP5, users should
be able to find the cell areas even without cell_measures, so I'm not
sure this decision is all that critical.
regards,
Karl
On 11/1/10 2:30 PM, martin.juckes at stfc.ac.uk wrote:
>
> Hello All,
>
> Sorry to be repetitive, but I want to repeat a question I raised
> earlier today (Monday in the UK) and hasn’t been answered yet: will
> the proposed change to the CF checker be matched to a change to the
> conformance document so that the CF 1.4 conformance no longer demands
> that variables named in cell_measures be in the same file?
>
> I’ve also copied Bryan and Michael in again, so to get a quality
> control perspective – as it worries me that an agreement made in a
> rush might not meet the expectations of the quality control we have
> committed to,
>
> Regards,
>
> Martin
>
> *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
> *Sent:* 01 November 2010 18:19
> *To:* Kettleborough, Jamie
> *Cc:* Bentley, Philip; V. Balaji; Juckes, Martin (STFC,RAL,SSTD);
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
> Doutriaux, Charles
> *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
> Hi Jamie,
>
> I'm arguing that given that cell_measures (or ext_cell_measures) will
> *not* appear in files containing fields most likely to be carried on a
> mesh different from the "primary" mesh (because I've removed those
> from the requested output table, and hence the CMOR tables), I think
> it is better to *assume* the remaining variables are on the "primary"
> mesh. I would be surprise if more than 1% of the variables written
> will have cell_measures pointing to an incorrect area field. If it
> does, I assume the area variable will have different latxlon
> dimensions than the variable itself, so it will be difficult for a
> user to mistakenly apply the areas.
>
> So rather than advocate completeness over correctness, I'd say I'm
> advocating "almost perfect" versus "perfect".
>
> If the number of offending cases is much larger than I'm imagining,
> please let me know.
>
> Best regards,
> Karl
>
> On 11/1/10 10:09 AM, Kettleborough, Jamie wrote:
>
> Hello Karl,
>
> thanks for this reply. Putting aside the issue of whether this is
> really ext_cell_measures or cell_measures then I think, given the
> resources we have locally, we have to make a choice of correctness vs
> completeness. The reason we are tempted to turn off ext_cell_measures
> is it is the least effort way we can see of submitting data that is
> correct. I think you are suggesting going for completness - even if
> we risk submitting some data with ext_cell_measures that is
> incorrect. Obviously this is *my* interpretation of what you are
> saying. Yes we can go for both correctness and completeness, but this
> will take us some effort - we need an exta component in our system
> that can recognise which cell areas to assign to which variables (with
> minimum error) - and we (like everyone) have lots of demands on our
> effort at the moment - and we have to make judgements about where to
> prioritise. (This isn't supposed to be a sob story - just trying to
> explain why we are tempted...)
>
> Would you recommend 'completeness' over 'correctness' - have I
> interpreted you correctly? What are the options for correcting
> incorrect meta-data once data is ingested into ESG?
>
> Jamie
>
> ------------------------------------------------------------------------
>
> *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
> *Sent:* 29 October 2010 21:36
> *To:* Kettleborough, Jamie
> *Cc:* Bentley, Philip; V. Balaji; martin.juckes at stfc.ac.uk
> <mailto:martin.juckes at stfc.ac.uk>; go-essp-tech at ucar.edu
> <mailto:go-essp-tech at ucar.edu>; cmor at lists.llnl.gov
> <mailto:cmor at lists.llnl.gov>; Kyle.Olivo at noaa.gov
> <mailto:Kyle.Olivo at noaa.gov>; Doutriaux, Charles
> *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
> Dear Jamie and Charles (a couple of questions for you),
>
> Hello Karl,
>
> I think the recommended way to 'turn off' ext_cell_measures is to
> make a call to cmor.set_variable_attribute(varid,
> 'ext_cell_measures', ''). Is that right? We are very tempted to
> do this for all variables - so basically overriding the MIP
> tables. How big a problem do you think this will be for data
> users - our grid is pretty straight forward and users can
> calculate cell_areas from the latitudes.
>
>
> Yes, if the cell areas stored in areacella are not appropriate for
> a particular field, and the requested output tables say that
> ext_cell_measure includes areacella, then you should call the set
> attribute function to reset ext_cell_measures="". Isn't that
> right Charles?
>
> Why are you tempted to turn off the ext_cell_measures for all
> variables? Then your output won't conform to the CMIP5 requirements.
>
> In the latest CMOR tables, I have removed ext_cell_measures from
> all the variables that we don't expect always to be on the
> standard mesh (i.e., on the grid for which areacella is correct).
> This includes velocities and transports and closely related
> fields, which are sometimes staggered relative to areacella. I
> would still be interested in hearing a clear explanation for why
> there are additional fields carried on a completely different grid.
>
> If users must compute the cell areas for only your grid, and for
> all others they simply read the areacella field in, then you are
> creating a special case that is completely unnecessary.
>
> That aside, doesn't the approach of providing alternative grid
> areas need more discussion?
>
> 1. how should we produce these. The most natural approach I can
> think of is to modify the fx MIP tables to add in areacellb (or
> whatever we choose to call it) and then output through CMOR - this
> will maximise the chance of consistency between different grid
> area files for any one model.
>
> 2. how should we reference these additional areas from a
> variable.? I could call cmor.set_variable_attribute(varid,
> 'ext_cell_measures', 'areacellb') - but in the tests I've done on
> CMOR 2.4 this only does half the job: it puts the appropriate
> ext_call_measures attribute into the file, but does nothing with
> associatedFiles.
>
> I don't think it is a high priority to standardize this
> immediately. We will want CMOR to place the fields in the
> subdirectory fx, so I need to check with Charles whether this
> requires the variable to appear in table fx. If not, I would
> probably build an entirely new table similar to fx, but with only
> the additional variables. This way you won't have to modify your
> table if a new fx table comes out. As for referencing these
> additional area variables, I think if you include area:
> <area_name> in the ext_cell_measures attribute, then if CMOR isn't
> already doing this, Charles can modify construction of
> associated_files to include something following the template
> "<area_name>: <area_name>_fx_IPSL-CM5_historical_r0i0p0.nc" What
> do you think, Charles?
>
> Clearly these may have been things you were going to cover - but
> ran out of time to, in which case sorry.
>
> I think another scenario that still needs some thought is one
> where a data provider has submitted data and published it in ESG.
> They then realise they made a mistake - they should have turned
> ext_cell_measures off, but didn't (or visa-versa). What happens in
> this case? (We have kind of done this in that we have send data
> with incorrect cell_measures to the BADC - but have caught the
> issue before ingestion into ESG - I don't believe we will always
> be this lucky). You'll probably see through why I'm asking this
> question about meta-data updates again now, so I may as well be
> explicit... If we choose to turn off ext_cell_measures for all our
> diagnostics on this initial submission - what are our options for
> recovering from this if we later found the decision to submit
> without ext_cell_measures was making our data hard to use?
>
>
> Please don't turn off ext_cell_measures (in general). I think
> you could easily write a script to remove the cell_measures
> attribute using netCDF tools, but adding it would require
> rewriting the entire file.
>
> Best regards,
> Karl
>
> Jamie
>
> ------------------------------------------------------------------------
>
> *From:*Karl Taylor [mailto:taylor13 at llnl.gov]
> *Sent:* 29 October 2010 02:15
> *To:* Bentley, Philip
> *Cc:* V. Balaji; martin.juckes at stfc.ac.uk
> <mailto:martin.juckes at stfc.ac.uk>; go-essp-tech at ucar.edu
> <mailto:go-essp-tech at ucar.edu>; cmor at lists.llnl.gov
> <mailto:cmor at lists.llnl.gov>; Kyle.Olivo at noaa.gov
> <mailto:Kyle.Olivo at noaa.gov>; Doutriaux, Charles;
> Kettleborough, Jamie
> *Subject:* Re: [Go-essp-tech] CMOR and cell_measures issues
>
> Dear all,
>
> I meant to try to address all the stuff in this discussion,
> but won't have time today. This email is just to say that I
> think we should insist that the cell_area files (areacella and
> areacello) be placed in the archive, even if there are also
> gridspec files. The ext_cell_measures attribute should also
> be included for fields that are on the "standard" grid (i.e.,
> the one with the cell areas stored in areacella or
> areacello). If there are other fields for which the standard
> areas are inappropriate and where your scientists think it is
> important to provide cell areas, then I recommend that you
> create specially named variables and place them in the "fx"
> subdirectories. For variables not on the "standard" grid
> (i.e., the grid of areacella or areacello), you should "turn
> off" the ext_cell_measures attribute.
>
> I don't expect most groups to produce gridspec files, so most
> analysts will be looking for areas in the areacella and
> areacello variables, not the gridspec files. This is why you
> should write the areacella and areacello files even if you
> also write the gridspec files.
>
> Also, could you please explain why you prefer not to duplicate
> the "fx" fields in each experiment's directory tree.
>
> Best regards,
> Karl
>
> On 10/25/10 7:12 AM, Bentley, Philip wrote:
>
> Hi Balaji,
>
>
>
> Phil, I'm very impressed that Had will have gridspec files,
>
> is this a done deal? I've been so pessimistic about this that
>
> I was wondering if even we should do one ourselves.
>
> Nope, not a done deal yet :-(
>
>
>
> In line with the CMIP5 expt design doc, we don't really need to provide
>
> gridspec files since all our model output is on either regular or
>
> uniform grids (i.e. simple cartesian product of lat& long).
>
>
>
> However, this whole cell_measures business prompted me to revisit the
>
> gridspec tools and output, which reminded me that the gridspec netcdf
>
> files include a cell area variable. Which in turn means we wouldn't need
>
> to provide a separate file (or files) for cell areas. Hence we could
>
> drop the ext_cell_measures attribute from our CMIP5 output files.
>
>
>
> Using the gridspec tools may be a quick and easy way for us to provide
>
> cell area info if we need to.
>
>
>
> Caveat: from a quick glance it looks like the netcdf files produced by
>
> the gridspec tools are not CF compliant. Is this is an issue? Presumably
>
> it is if we want all the data in the CMIP5 archive to be CF compliant.
>
> (NB: it could be I'm not running with the very latest version of the
>
> tools - but I couldn't see a more recent version on the gfdl web site).
>
>
>
> You know of course that gridspec says you can supply
>
>
>
> gridspec_fx_HadGEM2-ES_atm_pgrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_ugrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_vgrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
>
> as one single supergrid...
>
> If I could figure out how to output all 7 or 8 atm/ocn (sub-)grids to a
>
> single netcdf file I would, but the available documentation (e.g. for
>
> make_hgrid) isn't clear on this point. Sorry, that's probably just me
>
> being dumb! But if there is updated documentation then please point me
>
> to it. If necessary I could concatenate variables afterwards using NCO
>
> tools.
>
>
>
> Right now I'm trying to figure out how to create a gridspec file for our
>
> HadGEM2 ocean model, which uses a stretched (i.e. tartan/plaid) grid:
>
> longitudes are evenly spaced, latitudes vary from 1 deg to 1/3 deg.
>
> (Looks like I need to use the --my_grid_file option to supply the
>
> lat/long coords).
>
> But if you're doing gridspec at all, I will concede this
>
> point:-). Let's both do these separate gridspecs for now.
>
> Works for me.
>
>
>
> I think we're suffering from 'early-adopter syndrome' :-/
>
>
>
> Cheers,
>
> Phil
>
>
>
> Bentley, Philip writes:
>
>
>
> Hi Karl,
>
>
>
> A somewhat belated follow-up question in connection with
>
> this proposal
>
> (and with some slight overlap with Jamie's email which
>
> crossed on the
>
> ether)...
>
>
>
> As things stand the files named in the 'associated_files' attribute
>
> appear thus (using our RCP 4.5 simulation as an example):
>
>
>
> "... gridspecFile: gridspec_fx_HadGEM2-ES_rcp45_r0i0p0.nc areacella:
>
> areacella_fx_HadGEM2-ES_rcp45_r0i0p0.nc"
>
>
>
> Are the<expt_id>_<rip> parts (i.e. 'rcp45_r0i0p0.nc' ) actually
>
> required? AFAIK, our gridspec/cellarea files will not
>
> change from one
>
> simulation to the next using the same model (HadGEM2-ES in
>
> this case).
>
> Since, like most centers, we will be running large numbers of
>
> simulations using the same model, it looks like we would need to
>
> create numerous duplicates of the gridspec/cellarea files -
>
> or lots of
>
> symlinks
>
> - in order to for these references to make sense. Unless you are
>
> planning to manage that on our behalf somehow...?
>
>
>
> I think our 4 gridspec files for the HadGEM2 atm grids are
>
> likely to
>
> be called something like...
>
>
>
> gridspec_fx_HadGEM2-ES_atm_pgrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_ugrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_vgrid.nc
>
> gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
>
>
>
> So without any simulation-specific info. (There would also be files
>
> for the ocean grids)
>
>
>
> As it happens the gridspec files contain grid cell areas,
>
> so I'm now
>
> wondering if we'd even supply both?
>
>
>
> I'd be interested to hear your thoughts on this. I may be
>
> mis-understanding something/everything :-)
>
>
>
> Regards
>
> Phil
>
>
>
>
> --
> Scanned by iCritical.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101101/93c38634/attachment-0001.html
More information about the GO-ESSP-TECH
mailing list