[Go-essp-tech] CMOR and cell_measures issues

Kettleborough, Jamie jamie.kettleborough at metoffice.gov.uk
Tue Nov 2 11:42:28 MDT 2010


Hello Karl,
 
Thanks for the clarification - before we make a final decision on which
variables to include cell_measures for we'll take into account what you
have said here.
 
The variables that we have problems with (the diagnostics that are
neither velocities/transports or on the primary mesh) I think are the
time mean pressure level diagnostics.  Without looking at the actual
meshes and MIP tables to confirm I think this includes  things like ta,
zg from Amon and day tables, and ta from 6hrPlev.   How important is it
that users can (easily) take area means of these pressure level
diagnostics? 
 
I'm still unclear what our options are if we submit data that we later
find has inappropriate meta-data.   Any thoughts on this?
 
Jamie


________________________________

	From: Karl Taylor [mailto:taylor13 at llnl.gov] 
	Sent: 01 November 2010 18:19
	To: Kettleborough, Jamie
	Cc: Bentley, Philip; V. Balaji; martin.juckes at stfc.ac.uk;
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles
	Subject: Re: [Go-essp-tech] CMOR and cell_measures issues
	
	
	Hi Jamie,
	
	I'm arguing that given that cell_measures (or ext_cell_measures)
will *not* appear in files containing fields most likely to be carried
on a mesh different  from the "primary" mesh (because I've removed those
from the requested output table, and hence the CMOR tables), I think it
is better to *assume* the remaining variables are on the "primary" mesh.
I would be surprise if more than 1% of the variables written will have
cell_measures pointing to an incorrect area field.  If it does, I assume
the area variable will have different latxlon dimensions than the
variable itself, so it will be difficult for a user to mistakenly apply
the areas.
	
	So rather than advocate completeness over correctness, I'd say
I'm advocating "almost perfect" versus "perfect".
	
	If the number of offending cases is much larger than I'm
imagining, please let me know.
	
	Best regards,
	Karl
	
	On 11/1/10 10:09 AM, Kettleborough, Jamie wrote: 

		Hello Karl,
		 
		thanks for this reply.  Putting aside the issue of
whether this is really ext_cell_measures or cell_measures then I think,
given the resources we have locally, we have to make a choice of
correctness vs completeness.  The reason we are tempted to turn off
ext_cell_measures is it is the least effort way we can see of submitting
data that is correct.  I think you are suggesting going for completness
- even if we risk submitting some data with ext_cell_measures that is
incorrect.  Obviously this is *my* interpretation of what you are
saying.  Yes we can go for both correctness and completeness, but this
will take us some effort - we need an exta component in our system that
can recognise which cell areas to assign to which variables (with
minimum error) - and we (like everyone) have lots of demands on our
effort at the moment - and we have to make judgements about where to
prioritise.  (This isn't supposed to be a sob story - just trying to
explain why we are tempted...)
		 
		Would you recommend 'completeness' over 'correctness' -
have I interpreted you correctly?  What are the options for correcting
incorrect meta-data once data is ingested into ESG?
		 
		Jamie
		 


________________________________

			From: Karl Taylor [mailto:taylor13 at llnl.gov] 
			Sent: 29 October 2010 21:36
			To: Kettleborough, Jamie
			Cc: Bentley, Philip; V. Balaji;
martin.juckes at stfc.ac.uk; go-essp-tech at ucar.edu; cmor at lists.llnl.gov;
Kyle.Olivo at noaa.gov; Doutriaux, Charles
			Subject: Re: [Go-essp-tech] CMOR and
cell_measures issues
			
			
			Dear Jamie and Charles (a couple of questions
for you),
			

				Hello Karl,
				 
				I think the recommended way to 'turn
off' ext_cell_measures is to make a call to
cmor.set_variable_attribute(varid, 'ext_cell_measures', '').  Is that
right?  We are very tempted to do this for all variables - so basically
overriding the MIP tables.  How big a problem do you think this will be
for data users - our grid is pretty straight forward and users can
calculate cell_areas from the latitudes.


			Yes, if the cell areas stored in areacella are
not appropriate for a particular field, and the requested output tables
say that ext_cell_measure includes areacella, then you should call the
set attribute function to reset ext_cell_measures="".  Isn't that right
Charles?
			
			Why are you tempted to turn off the
ext_cell_measures for all variables?  Then your output won't conform to
the CMIP5 requirements.
			
			In the latest CMOR tables, I have removed
ext_cell_measures from all the variables that we don't expect always to
be on the standard mesh (i.e., on the grid for which areacella is
correct).  This includes velocities and transports and closely related
fields, which are sometimes staggered relative to areacella.  I would
still be interested in hearing a clear explanation for why there are
additional fields carried on a completely different grid. 
			
			If users must compute the cell areas for only
your grid, and for all others they simply read the areacella field in,
then you are creating a special case that is completely unnecessary.
			

				 
				That aside, doesn't the approach of
providing alternative grid areas need more discussion?
				 
				  1. how should we produce these.  The
most natural approach I can think of is to modify the fx MIP tables to
add in areacellb (or whatever we choose to call it) and then output
through CMOR - this will maximise the chance of consistency between
different grid area files for any one model. 
				 
				  2. how should we reference these
additional areas from a variable.? I could call
cmor.set_variable_attribute(varid, 'ext_cell_measures', 'areacellb') -
but in the tests I've done on CMOR 2.4 this only does half the job: it
puts the appropriate ext_call_measures attribute into the file, but does
nothing with associatedFiles.

			I don't think it is a high priority to
standardize this immediately.  We will want CMOR to place the fields in
the subdirectory fx, so I need to check with Charles whether this
requires the variable to appear in table fx.  If not, I would probably
build an entirely new table similar to fx, but with only the additional
variables.  This way you won't have to modify your table if a new fx
table comes out.  As for referencing these additional area variables, I
think if you include area: <area_name> in the ext_cell_measures
attribute, then if CMOR isn't already doing this, Charles can modify
construction of associated_files to include something following the
template "<area_name>: <area_name>_fx_IPSL-CM5_historical_r0i0p0.nc"
What do you think, Charles?
			

				 
				Clearly these may have been things you
were going to cover - but ran out of time to, in which case sorry.
				 
				I think another scenario that still
needs some thought is one where a data provider has submitted data and
published it in ESG.  They then realise they made a mistake - they
should have turned ext_cell_measures off, but didn't (or visa-versa).
What happens in this case?  (We have kind of done this in that we have
send data with incorrect cell_measures to the BADC - but have caught the
issue before ingestion into ESG  - I don't believe we will always be
this lucky).   You'll probably see through why I'm asking this question
about meta-data updates again now, so I may as well be explicit... If we
choose to turn off ext_cell_measures for all our diagnostics on this
initial submission - what are our options for recovering from this if we
later found the decision to submit without ext_cell_measures was making
our data hard to use?


			Please don't turn off ext_cell_measures (in
general).   I think you could easily write a script to remove the
cell_measures attribute using netCDF tools, but adding it would require
rewriting the entire file.
			
			Best regards,
			Karl
			

				 
				Jamie
				

________________________________

				From: Karl Taylor
[mailto:taylor13 at llnl.gov] 
				Sent: 29 October 2010 02:15
				To: Bentley, Philip
				Cc: V. Balaji; martin.juckes at stfc.ac.uk;
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles; Kettleborough, Jamie
				Subject: Re: [Go-essp-tech] CMOR and
cell_measures issues
				
				
				Dear all,
				
				I meant to try to address all the stuff
in this discussion, but won't have time today.  This email is just to
say that I think we should insist that the cell_area files (areacella
and areacello) be placed in the archive, even if there are also gridspec
files.   The ext_cell_measures attribute should also be included for
fields that are on the "standard" grid (i.e., the one with the cell
areas stored in areacella or areacello).  If there are other fields for
which the standard areas are inappropriate and where your scientists
think it is important to provide cell areas, then I recommend that you
create specially named variables and place them in the "fx"
subdirectories.   For variables not on the "standard" grid (i.e., the
grid of areacella or areacello), you should "turn off" the
ext_cell_measures attribute.
				
				I don't expect most groups to produce
gridspec files, so most analysts will be looking for areas in the
areacella and areacello variables, not the gridspec files.  This is why
you should write the areacella and areacello files even if you also
write the gridspec files.
				
				Also, could you please explain why you
prefer not to duplicate the "fx" fields in each experiment's directory
tree. 
				
				Best regards,
				Karl
				
				On 10/25/10 7:12 AM, Bentley, Philip
wrote: 

				Hi Balaji,
				 

				Phil, I'm very impressed that Had will
have gridspec files, 
				is this a done deal? I've been so
pessimistic about this that 
				I was wondering if even we should do one
ourselves.

				Nope, not a done deal yet :-(
				
				In line with the CMIP5 expt design doc,
we don't really need to provide
				gridspec files since all our model
output is on either regular or
				uniform grids (i.e. simple cartesian
product of lat & long).
				
				However, this whole cell_measures
business prompted me to revisit the
				gridspec tools and output, which
reminded me that the gridspec netcdf
				files include a cell area variable.
Which in turn means we wouldn't need
				to provide a separate file (or files)
for cell areas. Hence we could
				drop the ext_cell_measures attribute
from our CMIP5 output files.
				
				Using the gridspec tools may be a quick
and easy way for us to provide
				cell area info if we need to.
				
				Caveat: from a quick glance it looks
like the netcdf files produced by
				the gridspec tools are not CF compliant.
Is this is an issue? Presumably
				it is if we want all the data in the
CMIP5 archive to be CF compliant.
				(NB: it could be I'm not running with
the very latest version of the
				tools - but I couldn't see a more recent
version on the gfdl web site).
				

				You know of course that gridspec says
you can supply
				

				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc

				as one single supergrid...

				If I could figure out how to output all
7 or 8 atm/ocn (sub-)grids to a
				single netcdf file I would, but the
available documentation (e.g. for
				make_hgrid) isn't clear on this point.
Sorry, that's probably just me
				being dumb! But if there is updated
documentation then please point me
				to it. If necessary I could concatenate
variables afterwards using NCO
				tools.
				
				Right now I'm trying to figure out how
to create a gridspec file for our
				HadGEM2 ocean model, which uses a
stretched (i.e. tartan/plaid) grid:
				longitudes are evenly spaced, latitudes
vary from 1 deg to 1/3 deg.
				(Looks like I need to use the
--my_grid_file option to supply the
				lat/long coords).

				But if you're doing gridspec at all, I
will concede this 
				point:-). Let's both do these separate
gridspecs for now.

				Works for me.
				
				I think we're suffering from
'early-adopter syndrome' :-/
				
				Cheers,
				Phil
				

				Bentley, Philip writes:
				

				Hi Karl,
				
				A somewhat belated follow-up question in
connection with 

				this proposal 

				(and with some slight overlap with
Jamie's email which 

				crossed on the 

				ether)...
				
				As things stand the files named in the
'associated_files' attribute 
				appear thus (using our RCP 4.5
simulation as an example):
				
				"... gridspecFile:
gridspec_fx_HadGEM2-ES_rcp45_r0i0p0.nc areacella:
				areacella_fx_HadGEM2-ES_rcp45_r0i0p0.nc"
				
				Are the <expt_id>_<rip> parts (i.e.
'rcp45_r0i0p0.nc' ) actually 
				required? AFAIK, our gridspec/cellarea
files will not 

				change from one 

				simulation to the next using the same
model (HadGEM2-ES in 

				this case).

				Since, like most centers, we will be
running large numbers of 
				simulations using the same model, it
looks like we would need to 
				create numerous duplicates of the
gridspec/cellarea files - 

				or lots of 

				symlinks
				- in order to for these references to
make sense. Unless you are 
				planning to manage that on our behalf
somehow...?
				
				I think our 4 gridspec files for the
HadGEM2 atm grids are 

				likely to 

				be called something like...
				
				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
				
				So without any simulation-specific info.
(There would also be files 
				for the ocean grids)
				
				As it happens the gridspec files contain
grid cell areas, 

				so I'm now 

				wondering if we'd even supply both?
				
				I'd be interested to hear your thoughts
on this. I may be 
				mis-understanding something/everything
:-)
				
				Regards
				Phil

				 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101102/b240455d/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list