[Go-essp-tech] CMOR and cell_measures issues

Martina Stockhause martina.stockhause at zmaw.de
Tue Nov 2 07:46:28 MDT 2010


  Dear Karl, dear Martin,

the QC L2 data checker currently checks the metadata in the netcdf 
header against Karl's standard_output.xls columns 'output name', 
'standard name', 'long name', 'axis', 'bounds name', 'cell-methods', 
'type'. So, we only check the explicitly prescribed output requirements 
in the file headers. And CMOR2 compliance is checked in the ESG 
publisher, i.e. during QC L1 checks.

Which additional checks would be required and reasonable for CMIP5/CMOR2 
data, which might not be fully CF compliant itself? Should these 
additional checks be performed during QC Level 1 or during QC level 2 
checks? If we decide to include a cf compliance check, I would see it as 
a general data requirement and place it before or together with the 
CMOR2 compliance check, which would mean as part of the QC L1 checks. 
All requirements and our QC checks of the data should be reflected in 
the standard_output document especially for those modeling centers not 
using CMOR2 to produce their output.

Nevertheless, we would like to have look at the checks of the cf 
checker. Does anyone know, where we can get cfchecks.py? I cannot find 
it at http://cf-pcmdi.llnl.gov/conformance or 
http://cf-pcmdi.llnl.gov/conformance/compliance-checker.

Best wishes,
Martina


On 11/02/2010 10:14 AM, martin.juckes at stfc.ac.uk wrote:
> Hello Karl,
>
>
>
> Its true that there is no explicit requirement for complete
> CF-compliance in the QC level 2 documentation.
>
>
>
> I'll assume for the time being that there is no option of changing the
> CF 1.4 conformance document, so that CMOR v2.2 output will remain
> non-compliant even if the CF checker is adjusted to pass it with a
> warning.
>
>
>
> In terms of what is easiest for users, I believe that having accurate
> metadata in the file is crucial. I can see Jonathan's point that
> uniformity across the archive is also important, but I would rate that
> as a lower priority. Given our general commitment to promoting CF
> compliant data, I think that it would be a big mistake to keep producing
> non-compliant data now that we have identified the problem. Especially
> when the metadata in the files explicitly states that it is CF
> compliant.
>
>
>
> It appears that the volume of CMOR v2.2 data is too large for easy
> correction, so perhaps clearly documenting the departure from the CF
> convention is the best option - though this is not ideal as there is
> nothing in the files which will tell users where to find the relevant
> information.
>
>
>
> I also think that the departure from CF compliance should be documented
> in the QC metadata - but that can be dealt with later.
>
>
>
> Regards,
>
> Martin
>
> From: Karl Taylor [mailto:taylor13 at llnl.gov]
> Sent: 01 November 2010 22:13
> To: Juckes, Martin (STFC,RAL,SSTD)
> Cc: jamie.kettleborough at metoffice.gov.uk; Lawrence, Bryan
> (STFC,RAL,SSTD); lautenschlager at dkrz.de;
> philip.bentley at metoffice.gov.uk; V.Balaji at noaa.gov;
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
> Doutriaux, Charles
> Subject: Re: [Go-essp-tech] CMOR and cell_measures issues
>
>
>
> Couldn't we simply say that we certify that CMIP5 data conforms to the
> CF 1.4 standard except that the cell_measures variables may  be found in
> an external file, rather than the referencing file.  That way the data
> will pass the CMIP5 QC checks which  don't include requiring the
> cell_measures variables to be found in the referencing file.   I think
> the decision between cell_measures and ext_cell_measures should be based
> on which one will be most useful to the users.  In CMIP5, users should
> be able to find the cell areas even without cell_measures, so I'm not
> sure this decision is all that critical.
>
> regards,
> Karl
>
>
>
> On 11/1/10 2:30 PM, martin.juckes at stfc.ac.uk wrote:
>
> Hello All,
>
>
>
> Sorry to be repetitive, but I want to repeat a question I raised earlier
> today (Monday in the UK) and hasn't been answered yet: will the proposed
> change to the CF checker be matched to a change to the conformance
> document so that the CF 1.4 conformance no longer demands that variables
> named in cell_measures be in the same file?
>
>
>
> I've also copied Bryan and Michael in again, so to get a quality control
> perspective - as it worries me that an agreement made in a rush might
> not meet the expectations of the quality control we have committed to,
>
>
>
> Regards,
>
> Martin
>
>
>
> From: Karl Taylor [mailto:taylor13 at llnl.gov]
> Sent: 01 November 2010 18:19
> To: Kettleborough, Jamie
> Cc: Bentley, Philip; V. Balaji; Juckes, Martin (STFC,RAL,SSTD);
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
> Doutriaux, Charles
> Subject: Re: [Go-essp-tech] CMOR and cell_measures issues
>
>
>
> Hi Jamie,
>
> I'm arguing that given that cell_measures (or ext_cell_measures) will
> *not* appear in files containing fields most likely to be carried on a
> mesh different  from the "primary" mesh (because I've removed those from
> the requested output table, and hence the CMOR tables), I think it is
> better to *assume* the remaining variables are on the "primary" mesh.  I
> would be surprise if more than 1% of the variables written will have
> cell_measures pointing to an incorrect area field.  If it does, I assume
> the area variable will have different latxlon dimensions than the
> variable itself, so it will be difficult for a user to mistakenly apply
> the areas.
>
> So rather than advocate completeness over correctness, I'd say I'm
> advocating "almost perfect" versus "perfect".
>
> If the number of offending cases is much larger than I'm imagining,
> please let me know.
>
> Best regards,
> Karl
>
> On 11/1/10 10:09 AM, Kettleborough, Jamie wrote:
>
> Hello Karl,
>
>
>
> thanks for this reply.  Putting aside the issue of whether this is
> really ext_cell_measures or cell_measures then I think, given the
> resources we have locally, we have to make a choice of correctness vs
> completeness.  The reason we are tempted to turn off ext_cell_measures
> is it is the least effort way we can see of submitting data that is
> correct.  I think you are suggesting going for completness - even if we
> risk submitting some data with ext_cell_measures that is incorrect.
> Obviously this is *my* interpretation of what you are saying.  Yes we
> can go for both correctness and completeness, but this will take us some
> effort - we need an exta component in our system that can recognise
> which cell areas to assign to which variables (with minimum error) - and
> we (like everyone) have lots of demands on our effort at the moment -
> and we have to make judgements about where to prioritise.  (This isn't
> supposed to be a sob story - just trying to explain why we are
> tempted...)
>
>
>
> Would you recommend 'completeness' over 'correctness' - have I
> interpreted you correctly?  What are the options for correcting
> incorrect meta-data once data is ingested into ESG?
>
>
>
> Jamie
>
>
>
>
>
> 	
> ________________________________
>
>
> 	From: Karl Taylor [mailto:taylor13 at llnl.gov]
> 	Sent: 29 October 2010 21:36
> 	To: Kettleborough, Jamie
> 	Cc: Bentley, Philip; V. Balaji; martin.juckes at stfc.ac.uk;
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
> Doutriaux, Charles
> 	Subject: Re: [Go-essp-tech] CMOR and cell_measures issues
>
> 	Dear Jamie and Charles (a couple of questions for you),
> 	
> 	
> 	
>
> 	Hello Karl,
>
> 	
>
> 	I think the recommended way to 'turn off' ext_cell_measures is
> to make a call to cmor.set_variable_attribute(varid,
> 'ext_cell_measures', '').  Is that right?  We are very tempted to do
> this for all variables - so basically overriding the MIP tables.  How
> big a problem do you think this will be for data users - our grid is
> pretty straight forward and users can calculate cell_areas from the
> latitudes.
>
> 	
> 	Yes, if the cell areas stored in areacella are not appropriate
> for a particular field, and the requested output tables say that
> ext_cell_measure includes areacella, then you should call the set
> attribute function to reset ext_cell_measures="".  Isn't that right
> Charles?
> 	
> 	Why are you tempted to turn off the ext_cell_measures for all
> variables?  Then your output won't conform to the CMIP5 requirements.
> 	
> 	In the latest CMOR tables, I have removed ext_cell_measures from
> all the variables that we don't expect always to be on the standard mesh
> (i.e., on the grid for which areacella is correct).  This includes
> velocities and transports and closely related fields, which are
> sometimes staggered relative to areacella.  I would still be interested
> in hearing a clear explanation for why there are additional fields
> carried on a completely different grid.
> 	
> 	If users must compute the cell areas for only your grid, and for
> all others they simply read the areacella field in, then you are
> creating a special case that is completely unnecessary.
> 	
> 	
> 	
>
> 	
>
> 	That aside, doesn't the approach of providing alternative grid
> areas need more discussion?
>
> 	
>
> 	  1. how should we produce these.  The most natural approach I
> can think of is to modify the fx MIP tables to add in areacellb (or
> whatever we choose to call it) and then output through CMOR - this will
> maximise the chance of consistency between different grid area files for
> any one model.
>
> 	
>
> 	  2. how should we reference these additional areas from a
> variable.? I could call cmor.set_variable_attribute(varid,
> 'ext_cell_measures', 'areacellb') - but in the tests I've done on CMOR
> 2.4 this only does half the job: it puts the appropriate
> ext_call_measures attribute into the file, but does nothing with
> associatedFiles.
>
> 	I don't think it is a high priority to standardize this
> immediately.  We will want CMOR to place the fields in the subdirectory
> fx, so I need to check with Charles whether this requires the variable
> to appear in table fx.  If not, I would probably build an entirely new
> table similar to fx, but with only the additional variables.  This way
> you won't have to modify your table if a new fx table comes out.  As for
> referencing these additional area variables, I think if you include
> area:<area_name>  in the ext_cell_measures attribute, then if CMOR isn't
> already doing this, Charles can modify construction of associated_files
> to include something following the template "<area_name>:
> <area_name>_fx_IPSL-CM5_historical_r0i0p0.nc"   What do you think,
> Charles?
> 	
> 	
> 	
>
> 	
>
> 	Clearly these may have been things you were going to cover - but
> ran out of time to, in which case sorry.
>
> 	
>
> 	I think another scenario that still needs some thought is one
> where a data provider has submitted data and published it in ESG.  They
> then realise they made a mistake - they should have turned
> ext_cell_measures off, but didn't (or visa-versa). What happens in this
> case?  (We have kind of done this in that we have send data with
> incorrect cell_measures to the BADC - but have caught the issue before
> ingestion into ESG  - I don't believe we will always be this lucky).
> You'll probably see through why I'm asking this question about meta-data
> updates again now, so I may as well be explicit... If we choose to turn
> off ext_cell_measures for all our diagnostics on this initial submission
> - what are our options for recovering from this if we later found the
> decision to submit without ext_cell_measures was making our data hard to
> use?
>
> 	
> 	Please don't turn off ext_cell_measures (in general).   I think
> you could easily write a script to remove the cell_measures attribute
> using netCDF tools, but adding it would require rewriting the entire
> file.
> 	
> 	Best regards,
> 	Karl
> 	
> 	
> 	
>
> 	
>
> 	Jamie
>
> 		
> ________________________________
>
>
> 		From: Karl Taylor [mailto:taylor13 at llnl.gov]
> 		Sent: 29 October 2010 02:15
> 		To: Bentley, Philip
> 		Cc: V. Balaji; martin.juckes at stfc.ac.uk;
> go-essp-tech at ucar.edu; cmor at lists.llnl.gov; Kyle.Olivo at noaa.gov;
> Doutriaux, Charles; Kettleborough, Jamie
> 		Subject: Re: [Go-essp-tech] CMOR and cell_measures
> issues
>
> 		Dear all,
> 		
> 		I meant to try to address all the stuff in this
> discussion, but won't have time today.  This email is just to say that I
> think we should insist that the cell_area files (areacella and
> areacello) be placed in the archive, even if there are also gridspec
> files.   The ext_cell_measures attribute should also be included for
> fields that are on the "standard" grid (i.e., the one with the cell
> areas stored in areacella or areacello).  If there are other fields for
> which the standard areas are inappropriate and where your scientists
> think it is important to provide cell areas, then I recommend that you
> create specially named variables and place them in the "fx"
> subdirectories.   For variables not on the "standard" grid (i.e., the
> grid of areacella or areacello), you should "turn off" the
> ext_cell_measures attribute.
> 		
> 		I don't expect most groups to produce gridspec files, so
> most analysts will be looking for areas in the areacella and areacello
> variables, not the gridspec files.  This is why you should write the
> areacella and areacello files even if you also write the gridspec files.
> 		
> 		Also, could you please explain why you prefer not to
> duplicate the "fx" fields in each experiment's directory tree.
> 		
> 		Best regards,
> 		Karl
> 		
> 		On 10/25/10 7:12 AM, Bentley, Philip wrote:
>
> 		Hi Balaji,
> 		
>
> 			Phil, I'm very impressed that Had will have
> gridspec files,
> 			is this a done deal? I've been so pessimistic
> about this that
> 			I was wondering if even we should do one
> ourselves.
>
> 		Nope, not a done deal yet :-(
> 		
> 		In line with the CMIP5 expt design doc, we don't really
> need to provide
> 		gridspec files since all our model output is on either
> regular or
> 		uniform grids (i.e. simple cartesian product of lat&
> long).
> 		
> 		However, this whole cell_measures business prompted me
> to revisit the
> 		gridspec tools and output, which reminded me that the
> gridspec netcdf
> 		files include a cell area variable. Which in turn means
> we wouldn't need
> 		to provide a separate file (or files) for cell areas.
> Hence we could
> 		drop the ext_cell_measures attribute from our CMIP5
> output files.
> 		
> 		Using the gridspec tools may be a quick and easy way for
> us to provide
> 		cell area info if we need to.
> 		
> 		Caveat: from a quick glance it looks like the netcdf
> files produced by
> 		the gridspec tools are not CF compliant. Is this is an
> issue? Presumably
> 		it is if we want all the data in the CMIP5 archive to be
> CF compliant.
> 		(NB: it could be I'm not running with the very latest
> version of the
> 		tools - but I couldn't see a more recent version on the
> gfdl web site).
> 		
>
> 			You know of course that gridspec says you can
> supply
> 			
>
> 				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
>
> 			as one single supergrid...
>
> 		If I could figure out how to output all 7 or 8 atm/ocn
> (sub-)grids to a
> 		single netcdf file I would, but the available
> documentation (e.g. for
> 		make_hgrid) isn't clear on this point. Sorry, that's
> probably just me
> 		being dumb! But if there is updated documentation then
> please point me
> 		to it. If necessary I could concatenate variables
> afterwards using NCO
> 		tools.
> 		
> 		Right now I'm trying to figure out how to create a
> gridspec file for our
> 		HadGEM2 ocean model, which uses a stretched (i.e.
> tartan/plaid) grid:
> 		longitudes are evenly spaced, latitudes vary from 1 deg
> to 1/3 deg.
> 		(Looks like I need to use the --my_grid_file option to
> supply the
> 		lat/long coords).
>
> 			But if you're doing gridspec at all, I will
> concede this
> 			point:-). Let's both do these separate gridspecs
> for now.
>
> 		Works for me.
> 		
> 		I think we're suffering from 'early-adopter syndrome'
> :-/
> 		
> 		Cheers,
> 		Phil
> 		
>
> 			Bentley, Philip writes:
> 			
>
> 				Hi Karl,
> 				
> 				A somewhat belated follow-up question in
> connection with
>
> 			this proposal
>
> 				(and with some slight overlap with
> Jamie's email which
>
> 			crossed on the
>
> 				ether)...
> 				
> 				As things stand the files named in the
> 'associated_files' attribute
> 				appear thus (using our RCP 4.5
> simulation as an example):
> 				
> 				"... gridspecFile:
> gridspec_fx_HadGEM2-ES_rcp45_r0i0p0.nc areacella:
> 				areacella_fx_HadGEM2-ES_rcp45_r0i0p0.nc"
> 				
> 				Are the<expt_id>_<rip>  parts (i.e.
> 'rcp45_r0i0p0.nc' ) actually
> 				required? AFAIK, our gridspec/cellarea
> files will not
>
> 			change from one
>
> 				simulation to the next using the same
> model (HadGEM2-ES in
>
> 			this case).
>
> 				Since, like most centers, we will be
> running large numbers of
> 				simulations using the same model, it
> looks like we would need to
> 				create numerous duplicates of the
> gridspec/cellarea files -
>
> 			or lots of
>
> 				symlinks
> 				- in order to for these references to
> make sense. Unless you are
> 				planning to manage that on our behalf
> somehow...?
> 				
> 				I think our 4 gridspec files for the
> HadGEM2 atm grids are
>
> 			likely to
>
> 				be called something like...
> 				
> 				gridspec_fx_HadGEM2-ES_atm_pgrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_ugrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_vgrid.nc
> 				gridspec_fx_HadGEM2-ES_atm_uvgrid.nc
> 				
> 				So without any simulation-specific info.
> (There would also be files
> 				for the ocean grids)
> 				
> 				As it happens the gridspec files contain
> grid cell areas,
>
> 			so I'm now
>
> 				wondering if we'd even supply both?
> 				
> 				I'd be interested to hear your thoughts
> on this. I may be
> 				mis-understanding something/everything
> :-)
> 				
> 				Regards
> 				Phil
>
> 		
>
>
>
>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech

-- 
----------- DKRZ / Data Management -----------

Martina Stockhause
Deutsches Klimarechenzentrum
Bundesstr. 45a
D-20146 Hamburg
Germany

phone:	+49-40-460094-122
FAX:	+49-40-460094-106
e-mail:	martina.stockhause at zmaw.de

----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20101102/14fb84c3/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list