[Go-essp-tech] cmip5

V. Balaji V.Balaji at noaa.gov
Wed Sep 14 09:33:42 MDT 2011


Sebastien clearly points the unfeasibility of a per-dataset selection
capability. A 'Select All' button should satisfy the need for the most
part... narrow down your search using the facets and then generate a
single click download or single wget script.

It's implemented so in the GFDL data portal.

Sébastien Denvil writes:

> Hi Eric,
>
> see comments and feedback below.
>
> The purpose of this email is to give a end-user perspective and to recall the 
> "multi-model inter comparison" essence of CMIP5.
>
> I just hope it can be taken into account by gateways roadmap.
>
> On 13/09/2011 22:40, Eric Nienhouse wrote:
>> Dear All,
>> 
>> Thanks everyone for your thorough and thoughtful discussion on this
>> issue.  This support request details a number of usability issues
>> related to cross-cutting data discovery and access.  Much can be done
>> now to improve user experience as we all collectively work towards a
>> more robust system which addresses these community needs.
>> 
>> Please consider the following:
>> 
>> * Install/upgrade to Gateway version 1.3.2.  (This version addresses
>> several frequent end user support issues.)
>> * Enable certificate authorization (token-less) at all Gateways and Data
>> Nodes.
>> * Try out the Gateway 2.0 Beta and provide feedback:
>> (http://search-esg.prototype.ucar.edu)
>
> CMIP5 being a multi-model approach ; what end user are looking for is a way 
> to access given quantities (air temperature and liquid precipitation for 
> example) from a given experiment for all available models (reach that point 
> by minimizing click amounts is key to success). Seeing that as the primary 
> requirement the end-user final step of the scenario is :
> - end-user with a wget script that points files across datasets.
> - end-user start a globus-online download that points files across datasets.
> - and so on.
>
> May be the dataset centric view doesn't ease that approach but we think this 
> should be the way to go. I can't login with my OpenID so it could be that the 
> download options has been change in the v2 but I can't test it.
>
> See below an example.
>
>> * Install the Gateway 2.0 Beta for federation testing and review.
>> 
>> It will also be of great help to our end users to work towards
>> consistent DRS based file URLs, improved user documentation and
>> providing services to support bulk download.  Much of this is already
>> under way.
>> 
>> Improving the help documentation for certificate WGet access and adding
>> Globus Online as a download option is work in progress for the next
>> Gateway release 1.3.3.  This should be available very soon.
>
> What is the scenario for globus online? One "download job" per dataset or one 
> "download job" per search criteria?
>
> Example:
>
> - I'm interested in the short term simulation (18 experiments), all ensemble 
> member (let's say 6 members in average across models for each experiment)
>
> experiments="decadal1960 decadal1965 decadal1970 decadal1975 decadal1980 
> decadal1985 decadal1990 decadal1995 decadal2000 decadal2001 decadal2002 
> decadal2003 decadal2004 decadal2005 decadal2006 decadal2007 decadal2008 
> decadal2009"
>
> - I like those variables (6 variables from the same table!)
>
> 2D_variables[atmos][mon]="tas ts pctisccp"
> 3D_variables[atmos][mon]="ta hur clcalipso parasolRefl"
>
> - I want to analyse all the model distributing what's above (let's say 14 
> models)
>
> ===> So I have an interest for 9072 datasets.
>
> One download per search criteria should trigger one download job using 10 
> clicks.
> One download per dataset will trigger 9072 download jobs and something like 
> o(100000) clicks. This is not possible.
>
> Best regards.
> Sébastien
>
>> Thank you for your time and help.
>> 
>> Kind regards,
>> 
>> -Eric
>> 
>> 
>> Williams, Dean N. wrote:
>>> Hi Johathan,
>>>
>>>      I know this can be pain now, but we are working to improve the
>>> situation.
>>> I was also informed that other on the ESGF team are working to help the
>>> data movement/download situation. This help comes in the form of Globus
>>> Online (GO) and Data Mover-Lite (DML). For example DML also supports the
>>> list of features listed by IPSL:
>>>          * support for myproxy-logon.
>>>          * simple data selection with model,experiment,realm and
>>> variable etc.
>>>            in a simple tree search.
>>>          * multi threaded downloads,
>>>            NOTE: dml-webstart only supports downloading small files,
>>> but the
>>> standalone version
>>>            supports downloading bigfiles with multithreaded support.
>>>          * incremental process (ie, downloading only non-existing files)
>>> 
>>>
>>>      The features DML are working to incorporate are:
>>>          * manage datasets version following new DRS
>>>          * download history stored in a database
>>> 
>>>
>>>      We will keep you and the community abreast of the new features as
>>> they
>>> become available.
>>> 
>>> Best regards,
>>>      Dean
>>> 
>>> On 9/12/11 10:12 AM, "Jonathan Gregory"<j.m.gregory at reading.ac.uk>
>>> wrote:
>>> 
>>> 
>>>> Dear Stephen, Dean, Sebastien, Stephane     cc Jamie, Martin
>>>> 
>>>> Thank you very much for your emails. I am grateful to Stephen and
>>>> Dean for
>>>> responding positively and constructively to my email, despite its
>>>> being a
>>>> list of complaints. I'll certainly out try Sebastien's program, and
>>>> Stephane's too if you are willing to make it available; it's very
>>>> helpful
>>>> that you have written these.
>>>> 
>>>> Best wishes
>>>> 
>>>> Jonathan
>>>> 
>>> 
>>> Dear Jonathan,
>>> 
>>> Thanks for taking the time to describe your concerns about the usability 
>>> of the CMIP5 archive system.  I am CC'ing this to go-essp-tech at ucar.edu as 
>>> I think your feedback is particularly welcome and insightful and deserves 
>>> to be seen and discussed widely.
>>> 
>>> We are aware of many of the shortcomings you identify; improvements in 
>>> software and documentation are in progress that I hope will improve your 
>>> experience.  However, our progress has been slower than we'd hoped and we 
>>> are now up against significant CMIP5 usage which will inevitably impede 
>>> rolling-out improvements.  We would have hoped to have the system more 
>>> usable by now but we are pushing hard to improve the system as quickly as 
>>> possible.
>>> 
>>> You identify several user interface and performance issues with the ESG 
>>> Gateway search system.  Our colleagues at NCAR have been developing a new 
>>> version of the Gateway with an improved search backend that I believe 
>>> solves many of your concerns.  I've seen a test deployment at NCAR and it 
>>> is a significant improvement.  We at BADC will be deploying it for testing 
>>> in the next couple of days in the hope that it can be rolled-out quickly 
>>> for end-users.
>>> 
>>> Another point in your feedback is scriptability of downloads and checking 
>>> what is available.  We had hoped that the wget script generation feature 
>>> of the gateway would produce wget scripts that could be edited to download 
>>> different sorts of data by leveraging the Data Reference Syntax [1]. 
>>> Unfortunately, although some download URLs contain DRS information that 
>>> would help deducing alternative downloads, this isn't practical at 
>>> present.  We are working to improve the DRS consistency of the archive 
>>> that we hope will improve download scriptability.
>>> 
>>> The other mechanism you could use to programmatically download data and 
>>> discover new data is reading the THREDDS catalogs.  Every centre serving 
>>> CMIP5 data is running a THREDDS Data Server [2] which lists all download 
>>> URLs in a network of THREDDS XML catalogs.  This is intended as an 
>>> internal interface so isn't well documented.  However, I think it is no 
>>> secret that some users are doing this already.  You can find the THREDDS 
>>> source catalog of every dataset in the "History" tab of the Gateway's 
>>> dataset page or they can be deduced from download URLs and a little 
>>> knowledge of TDS.
>>> 
>>> I should add that downloading data directly from a TDS will only work if 
>>> it is configured to use "tokenless" security.  This is the case with only 
>>> some datanodes at present but should be fixed in the near term.
>>> 
>>> In the medium-term ESGF are planning documented service APIs that would 
>>> allow users to query the system programmatically and there is a new P2P 
>>> architecture in the works with more focus on scalability [3]
>>> 
>>> Regards,
>>> Stephen Pascoe.
>>> 
>>> [1] CMIP5 Data Reference Syntax: 
>>> http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>>> [2] THREDDS Data Server: http://www.unidata.ucar.edu/projects/THREDDS/
>>> [3] ESGF P2P Architecture: http://esgf.org/wiki/ESGF_Index
>>> 
>>> ---
>>> Stephen Pascoe  +44 (0)1235 445980
>>> Centre of Environmental Data Archival
>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>> 
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>
>

-- 

V. Balaji                               Office:  +1-609-452-6516
Head, Modeling Systems Group, GFDL      Home:    +1-212-253-6662
Princeton University                    Email: v.balaji at noaa.gov


More information about the GO-ESSP-TECH mailing list