[Go-essp-tech] cmip5

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Mon Sep 12 05:04:16 MDT 2011


Dear Jonathan,

Thanks for taking the time to describe your concerns about the usability of the CMIP5 archive system.  I am CC'ing this to go-essp-tech at ucar.edu as I think your feedback is particularly welcome and insightful and deserves to be seen and discussed widely.

We are aware of many of the shortcomings you identify; improvements in software and documentation are in progress that I hope will improve your experience.  However, our progress has been slower than we'd hoped and we are now up against significant CMIP5 usage which will inevitably impede rolling-out improvements.  We would have hoped to have the system more usable by now but we are pushing hard to improve the system as quickly as possible.

You identify several user interface and performance issues with the ESG Gateway search system.  Our colleagues at NCAR have been developing a new version of the Gateway with an improved search backend that I believe solves many of your concerns.  I've seen a test deployment at NCAR and it is a significant improvement.  We at BADC will be deploying it for testing in the next couple of days in the hope that it can be rolled-out quickly for end-users.

Another point in your feedback is scriptability of downloads and checking what is available.  We had hoped that the wget script generation feature of the gateway would produce wget scripts that could be edited to download different sorts of data by leveraging the Data Reference Syntax [1].  Unfortunately, although some download URLs contain DRS information that would help deducing alternative downloads, this isn't practical at present.  We are working to improve the DRS consistency of the archive that we hope will improve download scriptability.

The other mechanism you could use to programmatically download data and discover new data is reading the THREDDS catalogs.  Every centre serving CMIP5 data is running a THREDDS Data Server [2] which lists all download URLs in a network of THREDDS XML catalogs.  This is intended as an internal interface so isn't well documented.  However, I think it is no secret that some users are doing this already.  You can find the THREDDS source catalog of every dataset in the "History" tab of the Gateway's dataset page or they can be deduced from download URLs and a little knowledge of TDS.

I should add that downloading data directly from a TDS will only work if it is configured to use "tokenless" security.  This is the case with only some datanodes at present but should be fixed in the near term.  

In the medium-term ESGF are planning documented service APIs that would allow users to query the system programmatically and there is a new P2P architecture in the works with more focus on scalability [3]

Regards,
Stephen Pascoe.

[1] CMIP5 Data Reference Syntax: http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf 
[2] THREDDS Data Server: http://www.unidata.ucar.edu/projects/THREDDS/
[3] ESGF P2P Architecture: http://esgf.org/wiki/ESGF_Index 

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK


-----Original Message-----
From: Jonathan Gregory [mailto:j.m.gregory at reading.ac.uk] 
Sent: 12 September 2011 11:12
To: esg-support at earthsystemgrid.org
Subject: cmip5

Dear ESG

In preparation for working on the 1st draft of the AR5, I have begun to try to
download CMIP5 data. I have to say I am discouraged by the experience. Using
this web interface is slow and inconvenient, and I fear it will be an obstacle
to the work required to be done. The biggest limitation, I would say, is that
there is *only* a web interface. For CMIP3, I used ftp to download the data,
having written my own scripts. That minimised the manual effort required, and
most importantly I could use my script to fetch data I didn't already have,
which it could easily identify. With a web interface, working out what I don't
already have will only be possible by manual comparison, which will take a lot
of time. Is the http protocol that the web interface uses something that could
be employed in a script? If so, could you document it? Even if the protocol is
tricky, I would still much rather write a script than use a web interface, as
in the end it will be more efficient.

However, the web interface could be improved in various ways, I think, which
would make it more efficient. As it stands, I find the following inconvenient:

* The PCMDI gateway is sometimes slow. This morning (UK time) it is terribly
slow - unusable, in fact.

* It always searches when you change any of the criteria, so it searches all
of CMIP5 when you select the "Project", for instance. This wastes time.

* You have to select "all" in order to see the whole list again and make a
new selection, again wasting time with unnecessary searching.

* There is no way to select more than one thing at a time e.g. more than one
experiment or more than one quantity.

* All the datasets have to be ticked individually to proceed to download,
which is tedious.

* If there is more than one page, you can tick only one page at a time, so you
have to start all over again to do the next page, by repeating the whole
search laboriously.

* I can't (yet) get MRI or MIROC data, as it requires some further
authorisation that I have applied for. In fact I applied several days ago,
and I have not yet been authorised. How can I chase this up?

* The search facility at the top seems flaky. The "loading, please wait" never
goes away and it crashes with an http error sometimes.

* Although I would have thought that many users said that CMIP3 would have been
much more convenient if it had been possible to download annual data rather
than monthly - I certainly made this comment - that facility has not been
provided in the CMIP5 interface.

I am sure many people would be grateful if you could make some improvements.
(And I expect I am not the first to make these suggestions!)

Best wishes

Jonathan Gregory
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list