[Go-essp-tech] search problems with ESG
Robert S. Drach
drach1 at llnl.gov
Thu Jun 2 14:48:47 MDT 2011
Hi Eric,
Thanks, it works!
Here is a one-liner to republish all datasets for a given model:
% esglist_datasets --model <model_id> --no-header --select name cmip5 |
esgpublish --noscan --publish --use-list -
--Bob
Eric Nienhouse wrote:
> Hi Bob,
>
> The Gateway will allow an existing dataset version to be re-retrieved
> via esgpublish as long as there is no change to the file set it contains
> (ie: same file count, file names, file identifiers, sizes, etc.)
>
> There is currently support in the Gateway to allow additional checksums
> to be added to dataset files when a catalog is re-retrieved.. This
> supports the use case of reprocessing a dataset with checksum enabled
> after it was originally published.
>
> This type of catalog re-retrieval can be used to update properties that
> do not imply a version change. DRS properties such as experiment and
> ensemble may be added to a dataset (if they didn't exist previously)
> using this method.
>
> Thanks,
>
> -Eric
>
>
>
> Drach, Bob wrote:
>
>> Hi Eric,
>>
>>
>> On 6/2/11 7:13 AM, "Eric Nienhouse" <ejn at ucar.edu> wrote:
>>
>>
>>
>>> Hi Stephen,
>>>
>>> stephen.pascoe at stfc.ac.uk wrote:
>>>
>>>
>>>> Eric,
>>>>
>>>> I'm not sure this is entirely clear either. Presumably you mean make a
>>>> hessian call to tell the Gateway to re-retrieve the catalog.
>>>>
>>>>
>>> Your presumption is correct and I like your term "re-retrieve". I'll
>>> use this language in the future to avoid confusion :-)
>>>
>>>
>>>> From the datanode's perspective this is the final step in publishing. I
>>>> think of harvesting as being the exchange of RDF amongst Gateways.
>>>>
>>>> So, to be clear, do we need to do something like this:
>>>>
>>>> $ esgpublish --publish --noscan --use-list <listing-file>
>>>>
>>>>
>>>>
>>> Yes. This should simply make the publishing hessian call to the Gateway
>>> to re-retrieve the catalog for each dataset in the list. No rewriting
>>> of thredds nor rescanning of data will occur.
>>>
>>>
>> This will generate gateway errors I believe, since the datasets already
>> exist. And the alternative is to unpublish first, which presumably would
>> just compound the problem ...
>>
>> --Bob
>>
>>
>>
>>
>>>> This would make the publishing hessian call for each dataset in
>>>> <listing-file> without rewriting thredds or rescanning the data (I hope :-)
>>>> ).
>>>>
>>>> S.
>>>>
>>>> ---
>>>> Stephen Pascoe +44 (0)1235 445980
>>>> Centre of Environmental Data Archival
>>>> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
>>>> Behalf Of Eric Nienhouse
>>>> Sent: 02 June 2011 03:44
>>>> To: Drach, Bob
>>>> Cc: go-essp-tech at ucar.edu
>>>> Subject: Re: [Go-essp-tech] search problems with ESG
>>>>
>>>> Hi Bob,
>>>>
>>>> Sorry this was not clear. By "re-publish all affected dataset catalogs"
>>>> I mean: cause the gateway to re-harvest any existing catalogs that have
>>>> been affected by the delete side effect. In other words, please
>>>> "re-havest" all affected dataset catalogs.
>>>>
>>>> For example, unpublishing and republishing the INM datasets to add
>>>> checksum information may have affected other datasets (such as those
>>>> from BCC) causing the BCC datasets to loose key DRS search components.
>>>> In this case the BCC catalogs should be re-harvested by the gateway.
>>>> Note that re-harvesting INM catalogs may be required in this case as well.
>>>>
>>>> I believe the best way to do so is to run 'esgpublish --publish' on a
>>>> list of datasets. Is this a reasonable approach?
>>>>
>>>> Please let me know if you need any more details about this.
>>>>
>>>> Thanks,
>>>>
>>>> -Eric
>>>>
>>>> Drach, Bob wrote:
>>>>
>>>>
>>>>
>>>>> Hi Eric,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> 2) Please re-publish all affected dataset catalogs.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Not sure what you mean. Presumably republication happens to correct errors
>>>>> or reflect modified datasets. You don't want to undo that.
>>>>>
>>>>> For example, I unpublished and republished the INM datasets to add checksum
>>>>> information.
>>>>>
>>>>> I agree it's a good idea to refrain from removing existing datasets until a
>>>>> solution to the search problem can be distributed.
>>>>>
>>>>> --Bob
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>>
>>
>>
>
>
More information about the GO-ESSP-TECH
mailing list