[Go-essp-tech] cmip5
V. Balaji
V.Balaji at noaa.gov
Wed Sep 14 09:33:42 MDT 2011
Sebastien clearly points the unfeasibility of a per-dataset selection
capability. A 'Select All' button should satisfy the need for the most
part... narrow down your search using the facets and then generate a
single click download or single wget script.
It's implemented so in the GFDL data portal.
Sébastien Denvil writes:
> Hi Eric,
>
> see comments and feedback below.
>
> The purpose of this email is to give a end-user perspective and to recall the
> "multi-model inter comparison" essence of CMIP5.
>
> I just hope it can be taken into account by gateways roadmap.
>
> On 13/09/2011 22:40, Eric Nienhouse wrote:
>> Dear All,
>>
>> Thanks everyone for your thorough and thoughtful discussion on this
>> issue. This support request details a number of usability issues
>> related to cross-cutting data discovery and access. Much can be done
>> now to improve user experience as we all collectively work towards a
>> more robust system which addresses these community needs.
>>
>> Please consider the following:
>>
>> * Install/upgrade to Gateway version 1.3.2. (This version addresses
>> several frequent end user support issues.)
>> * Enable certificate authorization (token-less) at all Gateways and Data
>> Nodes.
>> * Try out the Gateway 2.0 Beta and provide feedback:
>> (http://search-esg.prototype.ucar.edu)
>
> CMIP5 being a multi-model approach ; what end user are looking for is a way
> to access given quantities (air temperature and liquid precipitation for
> example) from a given experiment for all available models (reach that point
> by minimizing click amounts is key to success). Seeing that as the primary
> requirement the end-user final step of the scenario is :
> - end-user with a wget script that points files across datasets.
> - end-user start a globus-online download that points files across datasets.
> - and so on.
>
> May be the dataset centric view doesn't ease that approach but we think this
> should be the way to go. I can't login with my OpenID so it could be that the
> download options has been change in the v2 but I can't test it.
>
> See below an example.
>
>> * Install the Gateway 2.0 Beta for federation testing and review.
>>
>> It will also be of great help to our end users to work towards
>> consistent DRS based file URLs, improved user documentation and
>> providing services to support bulk download. Much of this is already
>> under way.
>>
>> Improving the help documentation for certificate WGet access and adding
>> Globus Online as a download option is work in progress for the next
>> Gateway release 1.3.3. This should be available very soon.
>
> What is the scenario for globus online? One "download job" per dataset or one
> "download job" per search criteria?
>
> Example:
>
> - I'm interested in the short term simulation (18 experiments), all ensemble
> member (let's say 6 members in average across models for each experiment)
>
> experiments="decadal1960 decadal1965 decadal1970 decadal1975 decadal1980
> decadal1985 decadal1990 decadal1995 decadal2000 decadal2001 decadal2002
> decadal2003 decadal2004 decadal2005 decadal2006 decadal2007 decadal2008
> decadal2009"
>
> - I like those variables (6 variables from the same table!)
>
> 2D_variables[atmos][mon]="tas ts pctisccp"
> 3D_variables[atmos][mon]="ta hur clcalipso parasolRefl"
>
> - I want to analyse all the model distributing what's above (let's say 14
> models)
>
> ===> So I have an interest for 9072 datasets.
>
> One download per search criteria should trigger one download job using 10
> clicks.
> One download per dataset will trigger 9072 download jobs and something like
> o(100000) clicks. This is not possible.
>
> Best regards.
> Sébastien
>
>> Thank you for your time and help.
>>
>> Kind regards,
>>
>> -Eric
>>
>>
>> Williams, Dean N. wrote:
>>> Hi Johathan,
>>>
>>> I know this can be pain now, but we are working to improve the
>>> situation.
>>> I was also informed that other on the ESGF team are working to help the
>>> data movement/download situation. This help comes in the form of Globus
>>> Online (GO) and Data Mover-Lite (DML). For example DML also supports the
>>> list of features listed by IPSL:
>>> * support for myproxy-logon.
>>> * simple data selection with model,experiment,realm and
>>> variable etc.
>>> in a simple tree search.
>>> * multi threaded downloads,
>>> NOTE: dml-webstart only supports downloading small files,
>>> but the
>>> standalone version
>>> supports downloading bigfiles with multithreaded support.
>>> * incremental process (ie, downloading only non-existing files)
>>>
>>>
>>> The features DML are working to incorporate are:
>>> * manage datasets version following new DRS
>>> * download history stored in a database
>>>
>>>
>>> We will keep you and the community abreast of the new features as
>>> they
>>> become available.
>>>
>>> Best regards,
>>> Dean
>>>
>>> On 9/12/11 10:12 AM, "Jonathan Gregory"<j.m.gregory at reading.ac.uk>
>>> wrote:
>>>
>>>
>>>> Dear Stephen, Dean, Sebastien, Stephane cc Jamie, Martin
>>>>
>>>> Thank you very much for your emails. I am grateful to Stephen and
>>>> Dean for
>>>> responding positively and constructively to my email, despite its
>>>> being a
>>>> list of complaints. I'll certainly out try Sebastien's program, and
>>>> Stephane's too if you are willing to make it available; it's very
>>>> helpful
>>>> that you have written these.
>>>>
>>>> Best wishes
>>>>
>>>> Jonathan
>>>>
>>>
>>> Dear Jonathan,
>>>
>>> Thanks for taking the time to describe your concerns about the usability
>>> of the CMIP5 archive system. I am CC'ing this to go-essp-tech at ucar.edu as
>>> I think your feedback is particularly welcome and insightful and deserves
>>> to be seen and discussed widely.
>>>
>>> We are aware of many of the shortcomings you identify; improvements in
>>> software and documentation are in progress that I hope will improve your
>>> experience. However, our progress has been slower than we'd hoped and we
>>> are now up against significant CMIP5 usage which will inevitably impede
>>> rolling-out improvements. We would have hoped to have the system more
>>> usable by now but we are pushing hard to improve the system as quickly as
>>> possible.
>>>
>>> You identify several user interface and performance issues with the ESG
>>> Gateway search system. Our colleagues at NCAR have been developing a new
>>> version of the Gateway with an improved search backend that I believe
>>> solves many of your concerns. I've seen a test deployment at NCAR and it
>>> is a significant improvement. We at BADC will be deploying it for testing
>>> in the next couple of days in the hope that it can be rolled-out quickly
>>> for end-users.
>>>
>>> Another point in your feedback is scriptability of downloads and checking
>>> what is available. We had hoped that the wget script generation feature
>>> of the gateway would produce wget scripts that could be edited to download
>>> different sorts of data by leveraging the Data Reference Syntax [1].
>>> Unfortunately, although some download URLs contain DRS information that
>>> would help deducing alternative downloads, this isn't practical at
>>> present. We are working to improve the DRS consistency of the archive
>>> that we hope will improve download scriptability.
>>>
>>> The other mechanism you could use to programmatically download data and
>>> discover new data is reading the THREDDS catalogs. Every centre serving
>>> CMIP5 data is running a THREDDS Data Server [2] which lists all download
>>> URLs in a network of THREDDS XML catalogs. This is intended as an
>>> internal interface so isn't well documented. However, I think it is no
>>> secret that some users are doing this already. You can find the THREDDS
>>> source catalog of every dataset in the "History" tab of the Gateway's
>>> dataset page or they can be deduced from download URLs and a little
>>> knowledge of TDS.
>>>
>>> I should add that downloading data directly from a TDS will only work if
>>> it is configured to use "tokenless" security. This is the case with only
>>> some datanodes at present but should be fixed in the near term.
>>>
>>> In the medium-term ESGF are planning documented service APIs that would
>>> allow users to query the system programmatically and there is a new P2P
>>> architecture in the works with more focus on scalability [3]
>>>
>>> Regards,
>>> Stephen Pascoe.
>>>
>>> [1] CMIP5 Data Reference Syntax:
>>> http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf
>>> [2] THREDDS Data Server: http://www.unidata.ucar.edu/projects/THREDDS/
>>> [3] ESGF P2P Architecture: http://esgf.org/wiki/ESGF_Index
>>>
>>> ---
>>> Stephen Pascoe +44 (0)1235 445980
>>> Centre of Environmental Data Archival
>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>
>
--
V. Balaji Office: +1-609-452-6516
Head, Modeling Systems Group, GFDL Home: +1-212-253-6662
Princeton University Email: v.balaji at noaa.gov
More information about the GO-ESSP-TECH
mailing list