[Go-essp-tech] search problems with ESG

Robert S. Drach drach1 at llnl.gov
Thu Jun 2 14:48:47 MDT 2011


Hi Eric,

Thanks, it works!

Here is a one-liner to republish all datasets for a given model:

% esglist_datasets --model <model_id> --no-header --select name cmip5 | 
esgpublish --noscan --publish --use-list -

--Bob

Eric Nienhouse wrote:
> Hi Bob,
>
> The Gateway will allow an existing dataset version to be re-retrieved 
> via esgpublish as long as there is no change to the file set it contains 
> (ie: same file count, file names, file identifiers, sizes, etc.)
>
> There is currently support in the Gateway to allow additional checksums 
> to be added to dataset files when a catalog is re-retrieved..  This 
> supports the use case of reprocessing a dataset with checksum enabled 
> after it was originally published.
>
> This type of catalog re-retrieval can be used to update properties that 
> do not imply a version change.  DRS properties such as experiment and 
> ensemble may be added to a dataset (if they didn't exist previously) 
> using this method.
>
> Thanks,
>
> -Eric
>
>
>
> Drach, Bob wrote:
>   
>> Hi Eric,
>>
>>
>> On 6/2/11 7:13 AM, "Eric Nienhouse" <ejn at ucar.edu> wrote:
>>
>>   
>>     
>>> Hi Stephen,
>>>
>>> stephen.pascoe at stfc.ac.uk wrote:
>>>     
>>>       
>>>> Eric,
>>>>
>>>> I'm not sure this is entirely clear either.  Presumably you mean make a
>>>> hessian call to tell the Gateway to re-retrieve the catalog.
>>>>       
>>>>         
>>> Your presumption is correct and I like your term "re-retrieve".  I'll
>>> use this language in the future to avoid confusion :-)
>>>     
>>>       
>>>> From the datanode's perspective this is the final step in publishing.  I
>>>> think of harvesting as being the exchange of RDF amongst Gateways.
>>>>
>>>> So, to be clear, do we need to do something like this:
>>>>
>>>>  $ esgpublish --publish --noscan --use-list <listing-file>
>>>>   
>>>>       
>>>>         
>>> Yes.  This should simply make the publishing hessian call to the Gateway
>>> to re-retrieve the catalog for each dataset in the list.  No rewriting
>>> of thredds nor rescanning of data will occur.
>>>     
>>>       
>> This will generate gateway errors I believe, since the datasets already
>> exist. And the alternative is to unpublish first, which presumably would
>> just compound the problem ...
>>
>> --Bob
>>
>>
>>   
>>     
>>>> This would make the publishing hessian call for each dataset in
>>>> <listing-file> without rewriting thredds or rescanning  the data (I hope :-)
>>>> ).
>>>>
>>>> S.
>>>>
>>>> ---
>>>> Stephen Pascoe  +44 (0)1235 445980
>>>> Centre of Environmental Data Archival
>>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
>>>> Behalf Of Eric Nienhouse
>>>> Sent: 02 June 2011 03:44
>>>> To: Drach, Bob
>>>> Cc: go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] search problems with ESG
>>>>
>>>> Hi Bob,
>>>>
>>>> Sorry this was not clear.  By "re-publish all affected dataset catalogs"
>>>> I mean: cause the gateway to re-harvest any existing catalogs that have
>>>> been affected by the delete side effect.  In other words, please
>>>> "re-havest" all affected dataset catalogs.
>>>>
>>>> For example, unpublishing and republishing the INM datasets to add
>>>> checksum information may have affected other datasets (such as those
>>>> from BCC) causing the BCC datasets to loose key DRS search components.
>>>> In this case the BCC catalogs should be re-harvested by the gateway.
>>>> Note that re-harvesting INM catalogs may be required in this case as well.
>>>>
>>>> I believe the best way to do so is to run 'esgpublish --publish' on a
>>>> list of datasets.  Is this a reasonable approach?
>>>>
>>>> Please let me know if you need any more details about this.
>>>>
>>>> Thanks,
>>>>
>>>> -Eric
>>>>
>>>> Drach, Bob wrote:
>>>>   
>>>>       
>>>>         
>>>>> Hi Eric,
>>>>>
>>>>>
>>>>>   
>>>>>     
>>>>>         
>>>>>           
>>>>>> 2)  Please re-publish all affected dataset catalogs.
>>>>>>     
>>>>>>       
>>>>>>           
>>>>>>             
>>>>> Not sure what you mean. Presumably republication happens to correct errors
>>>>> or reflect modified datasets. You don't want to undo that.
>>>>>
>>>>> For example, I unpublished and republished the INM datasets to add checksum
>>>>> information.
>>>>>
>>>>> I agree it's a good idea to refrain from removing existing datasets until a
>>>>> solution to the search problem can be distributed.
>>>>>
>>>>> --Bob
>>>>>
>>>>>
>>>>>   
>>>>>     
>>>>>         
>>>>>           
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>   
>>>>       
>>>>         
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>     
>>>       
>>   
>>     
>
>   



More information about the GO-ESSP-TECH mailing list