[Go-essp-tech] ESG Gateway Search Preview

Eric Nienhouse ejn at ucar.edu
Thu Jul 28 15:12:15 MDT 2011


Hi Karl, All,

Thank you again for raising these search and discovery problems and 
keeping our user's concerns in the forefront of our attention.

As Don noted we're working on this area as our top priority.  Good 
things are underway and we have an early access community preview ready 
for review.  We trust this is on track to provide a robust search 
experience based on the feedback from initial use of the ESG system.  
Please see further below for caveats and other details related to this 
early preview site.  The URL follows:

http://search-esg.prototype.ucar.edu/home.htm

Please feel free to try out all aspects of search.  We've worked to 
address the many issues relating to free text strings, variable short 
and standard names, multiple variables, missing ensembles, model names 
and more.  Specific details on improvements can be found further below.

Thank you for all feedback, we truly appreciate it!

Please provide feedback to:  esg-gateway-dev at earthsystemgrid.org

And if you like, add a Jira ticket (or we'll add one for you):  
https://vets.development.ucar.edu/jira

This preview represents a significant re-work of the ESG gateway search 
system.  Note this work is in-development and there are a number of 
issues to address prior to a release.  In particular, the full work flow 
to data download and Wget script generation has limitations and you may 
run into errors.  Note, too, that the data holdings represented include 
NCAR and PCMDI CMIP5 and other projects.  We also may redeploy the site 
at any time to keep it up to date with progress.

It is our intent to deliver a production ready release candidate as soon 
as possible.  Your feedback will be useful in achieving this goal.  Note 
that upgrading existing ESG Gateways to the current 1.3.0 version is an 
key step towards doing so.

For further information on the remaining search related tasks please see:

https://vets.development.ucar.edu/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+GTWY+AND+fixVersion+%3D+%22SOLR+Production+Ready%22+AND+resolution+%3D+Unresolved+ORDER+BY+due+ASC%2C+priority+DESC%2C+created+ASC&mode=hide 


Following is a list of problems raised and how they are being addressed:

*  Consistency of search results/dataset counts:

We now have consistency across search category values (eg: institutes, 
models, ensembles) based on the authoritative source (Gateway and 
thredds catalog.)  Gateway administrators will no longer need to add 
these values by hand to their gateway database.  The thredds catalogs 
ultimately drive this.  We have some work to do relating to creation of 
new standard variable names.

*  Free text searching:

Free text search is now capable of handling the common syntax and 
characters used for CMIP5 and other activity names and labels.  Further, 
words like "the" are generally excluded from the search query yielding 
better and more expected results.  Certain fields are no longer used for 
free text searching such as a variable standard name "description".

*  Specific variable search issues:

Both standard names and "short" names are provided as facets.  This 
allows either form to be used, enabling faceted search of the variable 
space in cases where a standard name may not be available.  Standard 
name "long" descriptions are no longer indexed as this led to confusing 
results based on feedback.  Multiple variable facets can now be 
selected, which is a feature many users have requested.

*  Latency in search indexing:

Publication of new datasets results in nearly immediate search 
indexing.  This means published datasets will be available via search as 
soon as they are published to the gateway.  Latency still exists when 
harvesting "remote" datasets from another gateway and this is currently 
as designed.

Again, thank you all for your feedback regarding the search experience.  
We look forward to providing a release candidate of this work for review 
as well.

Kind Regards,

The ESG Gateway Team at NCAR.

Don Middleton wrote:
> Hi Karl - Thanks for staying on top of all of these issues. They're 
> all logged into Jira and we're working on the search-related ones as a 
> top priority right now. We'll follow up shortly with a more detailed 
> status update, and a community preview of some fixes.
>
> cheers - don, on behalf of the NCAR team
>
>
> On Jul 27, 2011, at 4:53 PM, Karl Taylor wrote:
>
>> Dear all,
>>
>> Currently (as of a few minutes ago) the ESG sites are showing the 
>> following number of CMIP5 datasets:
>>
>> PCMDI        8040
>> BADC         8040
>> NCAR         8040
>> NCI              8041
>> DKRZ         8032
>> ORNL         8031
>> NERSC       7525
>> JPL              7524
>>
>>
>> I think we know about the NCI single extra dataset, which they're 
>> presumably working on removing.  Also we know NERSC and JPL are not 
>> seeing data at the NCI gateway (CSIRO model).  Can anyone explain
>>
>> 1) the missing 8 datasets at DKRZ and the 9 missing datasets at ORNL?
>> 2) the difference of 1 between NERSC and JPL?
>>
>> Best regards,
>> Karl
>>
>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu <mailto:GO-ESSP-TECH at ucar.edu>
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>   



More information about the GO-ESSP-TECH mailing list