[Go-essp-tech] search problems with ESG

Eric Nienhouse ejn at ucar.edu
Thu Jun 2 12:27:49 MDT 2011


Hi Bob,

The Gateway will allow an existing dataset version to be re-retrieved 
via esgpublish as long as there is no change to the file set it contains 
(ie: same file count, file names, file identifiers, sizes, etc.)

There is currently support in the Gateway to allow additional checksums 
to be added to dataset files when a catalog is re-retrieved..  This 
supports the use case of reprocessing a dataset with checksum enabled 
after it was originally published.

This type of catalog re-retrieval can be used to update properties that 
do not imply a version change.  DRS properties such as experiment and 
ensemble may be added to a dataset (if they didn't exist previously) 
using this method.

Thanks,

-Eric



Drach, Bob wrote:
> Hi Eric,
>
>
> On 6/2/11 7:13 AM, "Eric Nienhouse" <ejn at ucar.edu> wrote:
>
>   
>> Hi Stephen,
>>
>> stephen.pascoe at stfc.ac.uk wrote:
>>     
>>> Eric,
>>>
>>> I'm not sure this is entirely clear either.  Presumably you mean make a
>>> hessian call to tell the Gateway to re-retrieve the catalog.
>>>       
>> Your presumption is correct and I like your term "re-retrieve".  I'll
>> use this language in the future to avoid confusion :-)
>>     
>>> From the datanode's perspective this is the final step in publishing.  I
>>> think of harvesting as being the exchange of RDF amongst Gateways.
>>>
>>> So, to be clear, do we need to do something like this:
>>>
>>>  $ esgpublish --publish --noscan --use-list <listing-file>
>>>   
>>>       
>> Yes.  This should simply make the publishing hessian call to the Gateway
>> to re-retrieve the catalog for each dataset in the list.  No rewriting
>> of thredds nor rescanning of data will occur.
>>     
>
> This will generate gateway errors I believe, since the datasets already
> exist. And the alternative is to unpublish first, which presumably would
> just compound the problem ...
>
> --Bob
>
>
>   
>>> This would make the publishing hessian call for each dataset in
>>> <listing-file> without rewriting thredds or rescanning  the data (I hope :-)
>>> ).
>>>
>>> S.
>>>
>>> ---
>>> Stephen Pascoe  +44 (0)1235 445980
>>> Centre of Environmental Data Archival
>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>
>>>
>>> -----Original Message-----
>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
>>> Behalf Of Eric Nienhouse
>>> Sent: 02 June 2011 03:44
>>> To: Drach, Bob
>>> Cc: go-essp-tech at ucar.edu
>>> Subject: Re: [Go-essp-tech] search problems with ESG
>>>
>>> Hi Bob,
>>>
>>> Sorry this was not clear.  By "re-publish all affected dataset catalogs"
>>> I mean: cause the gateway to re-harvest any existing catalogs that have
>>> been affected by the delete side effect.  In other words, please
>>> "re-havest" all affected dataset catalogs.
>>>
>>> For example, unpublishing and republishing the INM datasets to add
>>> checksum information may have affected other datasets (such as those
>>> from BCC) causing the BCC datasets to loose key DRS search components.
>>> In this case the BCC catalogs should be re-harvested by the gateway.
>>> Note that re-harvesting INM catalogs may be required in this case as well.
>>>
>>> I believe the best way to do so is to run 'esgpublish --publish' on a
>>> list of datasets.  Is this a reasonable approach?
>>>
>>> Please let me know if you need any more details about this.
>>>
>>> Thanks,
>>>
>>> -Eric
>>>
>>> Drach, Bob wrote:
>>>   
>>>       
>>>> Hi Eric,
>>>>
>>>>
>>>>   
>>>>     
>>>>         
>>>>> 2)  Please re-publish all affected dataset catalogs.
>>>>>     
>>>>>       
>>>>>           
>>>> Not sure what you mean. Presumably republication happens to correct errors
>>>> or reflect modified datasets. You don't want to undo that.
>>>>
>>>> For example, I unpublished and republished the INM datasets to add checksum
>>>> information.
>>>>
>>>> I agree it's a good idea to refrain from removing existing datasets until a
>>>> solution to the search problem can be distributed.
>>>>
>>>> --Bob
>>>>
>>>>
>>>>   
>>>>     
>>>>         
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>   
>>>       
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>     
>
>   



More information about the GO-ESSP-TECH mailing list