[Go-essp-tech] CMOR and cell_measures issues

martin.juckes at stfc.ac.uk martin.juckes at stfc.ac.uk
Tue Nov 2 03:14:40 MDT 2010


Hello Karl,

 

Its true that there is no explicit requirement for complete
CF-compliance in the QC level 2 documentation.

 

I'll assume for the time being that there is no option of changing the
CF 1.4 conformance document, so that CMOR v2.2 output will remain
non-compliant even if the CF checker is adjusted to pass it with a
warning.

 

In terms of what is easiest for users, I believe that having accurate
metadata in the file is crucial. I can see Jonathan's point that
uniformity across the archive is also important, but I would rate that
as a lower priority. Given our general commitment to promoting CF
compliant data, I think that it would be a big mistake to keep producing
non-compliant data now that we have identified the problem. Especially
when the metadata in the files explicitly states that it is CF
compliant.

 

It appears that the volume of CMOR v2.2 data is too large for easy
correction, so perhaps clearly documenting the departure from the CF
convention is the best option - though this is not ideal as there is
nothing in the files which will tell users where to find the relevant
information.

 

I also think that the departure from CF compliance should be documented
in the QC metadata - but that can be dealt with later.

 

Regards,

Martin

From: Karl Taylor [mailto:taylor13 at llnl.gov] 
Sent: 01 November 2010 22:13
To: Juckes, Martin (STFC,RAL,SSTD)
Cc: jamie.kettleborough at metoffice.gov.uk; Lawrence, Bryan
(STFC,RAL,SSTD); lautenschlager at dkrz.de;
philip.bentley at metoffice.gov.uk; V.Balaji at noaa.gov;
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles
Subject: Re: [Go-essp-tech] CMOR and cell_measures issues

 

Couldn't we simply say that we certify that CMIP5 data conforms to the
CF 1.4 standard except that the cell_measures variables may  be found in
an external file, rather than the referencing file.  That way the data
will pass the CMIP5 QC checks which  don't include requiring the
cell_measures variables to be found in the referencing file.   I think
the decision between cell_measures and ext_cell_measures should be based
on which one will be most useful to the users.  In CMIP5, users should
be able to find the cell areas even without cell_measures, so I'm not
sure this decision is all that critical.

regards,
Karl



On 11/1/10 2:30 PM, martin.juckes at stfc.ac.uk wrote: 

Hello All,

 

Sorry to be repetitive, but I want to repeat a question I raised earlier
today (Monday in the UK) and hasn't been answered yet: will the proposed
change to the CF checker be matched to a change to the conformance
document so that the CF 1.4 conformance no longer demands that variables
named in cell_measures be in the same file? 

 

I've also copied Bryan and Michael in again, so to get a quality control
perspective - as it worries me that an agreement made in a rush might
not meet the expectations of the quality control we have committed to,

 

Regards,

Martin 

 

From: Karl Taylor [mailto:taylor13 at llnl.gov] 
Sent: 01 November 2010 18:19
To: Kettleborough, Jamie
Cc: Bentley, Philip; V. Balaji; Juckes, Martin (STFC,RAL,SSTD);
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles
Subject: Re: [Go-essp-tech] CMOR and cell_measures issues

 

Hi Jamie,

I'm arguing that given that cell_measures (or ext_cell_measures) will
*not* appear in files containing fields most likely to be carried on a
mesh different  from the "primary" mesh (because I've removed those from
the requested output table, and hence the CMOR tables), I think it is
better to *assume* the remaining variables are on the "primary" mesh.  I
would be surprise if more than 1% of the variables written will have
cell_measures pointing to an incorrect area field.  If it does, I assume
the area variable will have different latxlon dimensions than the
variable itself, so it will be difficult for a user to mistakenly apply
the areas.

So rather than advocate completeness over correctness, I'd say I'm
advocating "almost perfect" versus "perfect".

If the number of offending cases is much larger than I'm imagining,
please let me know.

Best regards,
Karl

On 11/1/10 10:09 AM, Kettleborough, Jamie wrote: 

Hello Karl,

 

thanks for this reply.  Putting aside the issue of whether this is
really ext_cell_measures or cell_measures then I think, given the
resources we have locally, we have to make a choice of correctness vs
completeness.  The reason we are tempted to turn off ext_cell_measures
is it is the least effort way we can see of submitting data that is
correct.  I think you are suggesting going for completness - even if we
risk submitting some data with ext_cell_measures that is incorrect.
Obviously this is *my* interpretation of what you are saying.  Yes we
can go for both correctness and completeness, but this will take us some
effort - we need an exta component in our system that can recognise
which cell areas to assign to which variables (with minimum error) - and
we (like everyone) have lots of demands on our effort at the moment -
and we have to make judgements about where to prioritise.  (This isn't
supposed to be a sob story - just trying to explain why we are
tempted...)

 

Would you recommend 'completeness' over 'correctness' - have I
interpreted you correctly?  What are the options for correcting
incorrect meta-data once data is ingested into ESG?

 

Jamie

 

 

	
________________________________


	From: Karl Taylor [mailto:taylor13 at llnl.gov] 
	Sent: 29 October 2010 21:36
	To: Kettleborough, Jamie
	Cc: Bentley, Philip; V. Balaji; martin.juckes at stfc.ac.uk;
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles
	Subject: Re: [Go-essp-tech] CMOR and cell_measures issues

	Dear Jamie and Charles (a couple of questions for you),
	
	
	

	Hello Karl,

	 

	I think the recommended way to 'turn off' ext_cell_measures is
to make a call to cmor.set_variable_attribute(varid,
'ext_cell_measures', '').  Is that right?  We are very tempted to do
this for all variables - so basically overriding the MIP tables.  How
big a problem do you think this will be for data users - our grid is
pretty straight forward and users can calculate cell_areas from the
latitudes.

	
	Yes, if the cell areas stored in areacella are not appropriate
for a particular field, and the requested output tables say that
ext_cell_measure includes areacella, then you should call the set
attribute function to reset ext_cell_measures="".  Isn't that right
Charles?
	
	Why are you tempted to turn off the ext_cell_measures for all
variables?  Then your output won't conform to the CMIP5 requirements.
	
	In the latest CMOR tables, I have removed ext_cell_measures from
all the variables that we don't expect always to be on the standard mesh
(i.e., on the grid for which areacella is correct).  This includes
velocities and transports and closely related fields, which are
sometimes staggered relative to areacella.  I would still be interested
in hearing a clear explanation for why there are additional fields
carried on a completely different grid. 
	
	If users must compute the cell areas for only your grid, and for
all others they simply read the areacella field in, then you are
creating a special case that is completely unnecessary.
	
	
	

	 

	That aside, doesn't the approach of providing alternative grid
areas need more discussion?

	 

	  1. how should we produce these.  The most natural approach I
can think of is to modify the fx MIP tables to add in areacellb (or
whatever we choose to call it) and then output through CMOR - this will
maximise the chance of consistency between different grid area files for
any one model. 

	 

	  2. how should we reference these additional areas from a
variable.? I could call cmor.set_variable_attribute(varid,
'ext_cell_measures', 'areacellb') - but in the tests I've done on CMOR
2.4 this only does half the job: it puts the appropriate
ext_call_measures attribute into the file, but does nothing with
associatedFiles.

	I don't think it is a high priority to standardize this
immediately.  We will want CMOR to place the fields in the subdirectory
fx, so I need to check with Charles whether this requires the variable
to appear in table fx.  If not, I would probably build an entirely new
table similar to fx, but with only the additional variables.  This way
you won't have to modify your table if a new fx table comes out.  As for
referencing these additional area variables, I think if you include
area: <area_name> in the ext_cell_measures attribute, then if CMOR isn't
already doing this, Charles can modify construction of associated_files
to include something following the template "<area_name>:
<area_name>_fx_IPSL-CM5_historical_r0i0p0.nc"   What do you think,
Charles?
	
	
	

	 

	Clearly these may have been things you were going to cover - but
ran out of time to, in which case sorry.

	 

	I think another scenario that still needs some thought is one
where a data provider has submitted data and published it in ESG.  They
then realise they made a mistake - they should have turned
ext_cell_measures off, but didn't (or visa-versa). What happens in this
case?  (We have kind of done this in that we have send data with
incorrect cell_measures to the BADC - but have caught the issue before
ingestion into ESG  - I don't believe we will always be this lucky).
You'll probably see through why I'm asking this question about meta-data
updates again now, so I may as well be explicit... If we choose to turn
off ext_cell_measures for all our diagnostics on this initial submission
- what are our options for recovering from this if we later found the
decision to submit without ext_cell_measures was making our data hard to
use?

	
	Please don't turn off ext_cell_measures (in general).   I think
you could easily write a script to remove the cell_measures attribute
using netCDF tools, but adding it would require rewriting the entire
file.
	
	Best regards,
	Karl
	
	
	

	 

	Jamie

		
________________________________


		From: Karl Taylor [mailto:taylor13 at llnl.gov] 
		Sent: 29 October 2010 02:15
		To: Bentley, Philip
		Cc: V. Balaji; martin.juckes at stfc.ac.uk;
go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
Doutriaux, Charles; Kettleborough, Jamie
		Subject: Re: [Go-essp-tech] CMOR and cell_measures
issues

		Dear all,
		
		I meant to try to address all the stuff in this
discussion, but won't have time today.  This email is just to say that I
think we should insist that the cell_area files (areacella and
areacello) be placed in the archive, even if there are also gridspec
files.   The ext_cell_measures attribute should also be included for
fields that are on the "standard" grid (i.e., the one with the cell
areas stored in areacella or areacello).  If there are other fields for
which the standard areas are inappropriate and where your scientists
think it is important to provide cell areas, then I recommend that you
create specially named variables and place them in the "fx"
subdirectories.   For variables not on the "standard" grid (i.e., the
grid of areacella or areacello), you should "turn off" the
ext_cell_measures attribute.
		
		I don't expect most groups to produce gridspec files, so
most analysts will be looking for areas in the areacella and areacello
variables, not the gridspec files.  This is why you should write the
areacella and areacello files even if you also write the gridspec files.
		
		Also, could you please explain why you prefer not to
duplicate the "fx" fields in each experiment's directory tree. 
		
		Best regards,
		Karl
		
		On 10/25/10 7:12 AM, Bentley, Philip wrote: 

		Hi Balaji,
		 

			Phil, I'm very impressed that Had will have
gridspec files, 
			is this a done deal? I've been so pessimistic
about this that 
			I was wondering if even we should do one
ourselves.

		Nope, not a done deal yet :-(
		 
		In line with the CMIP5 expt design doc, we don't really
need to provide
		gridspec files since all our model output is on either
regular or
		uniform grids (i.e. simple cartesian product of lat &
long).
		 
		However, this whole cell_measures business prompted me
to revisit the
		gridspec tools and output, which reminded me that the
gridspec netcdf
		files include a cell area variable. Which in turn means
we wouldn't need
		to provide a separate file (or files) for cell areas.
Hence we could
		drop the ext_cell_measures attribute from our CMIP5
output files.
		 
		Using the gridspec tools may be a quick and easy way for
us to provide
		cell area info if we need to.
		 
		Caveat: from a quick glance it looks like the netcdf
files produced by
		the gridspec tools are not CF compliant. Is this is an
issue? Presumably
		it is if we want all the data in the CMIP5 archive to be
CF compliant.
		(NB: it could be I'm not running with the very latest
version of the
		tools - but I couldn't see a more recent version on the
gfdl web site).
		 

			You know of course that gridspec says you can
supply
			 

				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc

			as one single supergrid...

		If I could figure out how to output all 7 or 8 atm/ocn
(sub-)grids to a
		single netcdf file I would, but the available
documentation (e.g. for
		make_hgrid) isn't clear on this point. Sorry, that's
probably just me
		being dumb! But if there is updated documentation then
please point me
		to it. If necessary I could concatenate variables
afterwards using NCO
		tools.
		 
		Right now I'm trying to figure out how to create a
gridspec file for our
		HadGEM2 ocean model, which uses a stretched (i.e.
tartan/plaid) grid:
		longitudes are evenly spaced, latitudes vary from 1 deg
to 1/3 deg.
		(Looks like I need to use the --my_grid_file option to
supply the
		lat/long coords).

			But if you're doing gridspec at all, I will
concede this 
			point:-). Let's both do these separate gridspecs
for now.

		Works for me.
		 
		I think we're suffering from 'early-adopter syndrome'
:-/
		 
		Cheers,
		Phil
		 

			Bentley, Philip writes:
			 

				Hi Karl,
				 
				A somewhat belated follow-up question in
connection with 

			this proposal 

				(and with some slight overlap with
Jamie's email which 

			crossed on the 

				ether)...
				 
				As things stand the files named in the
'associated_files' attribute 
				appear thus (using our RCP 4.5
simulation as an example):
				 
				"... gridspecFile:
gridspec_fx_HadGEM2-ES_rcp45_r0i0p0.nc areacella:
				areacella_fx_HadGEM2-ES_rcp45_r0i0p0.nc"
				 
				Are the <expt_id>_<rip> parts (i.e.
'rcp45_r0i0p0.nc' ) actually 
				required? AFAIK, our gridspec/cellarea
files will not 

			change from one 

				simulation to the next using the same
model (HadGEM2-ES in 

			this case).

				Since, like most centers, we will be
running large numbers of 
				simulations using the same model, it
looks like we would need to 
				create numerous duplicates of the
gridspec/cellarea files - 

			or lots of 

				symlinks
				- in order to for these references to
make sense. Unless you are 
				planning to manage that on our behalf
somehow...?
				 
				I think our 4 gridspec files for the
HadGEM2 atm grids are 

			likely to 

				be called something like...
				 
				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
				 
				So without any simulation-specific info.
(There would also be files 
				for the ocean grids)
				 
				As it happens the gridspec files contain
grid cell areas, 

			so I'm now 

				wondering if we'd even supply both?
				 
				I'd be interested to hear your thoughts
on this. I may be 
				mis-understanding something/everything
:-)
				 
				Regards
				Phil

		 

 

-- 
Scanned by iCritical. 

 


-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101102/d111298d/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list