[Go-essp-tech] Fwd: Re: Publishing dataset with option --update

Drach, Bob drach1 at llnl.gov
Tue Jan 10 13:42:03 MST 2012


Hi Stephen,

You're right. It's perfectly OK - and in general preferable - to publish a
new version without unpublishing the previous version.

In the case below, my impression was that preserving the previous version
was not the major concern. Given that the publication of new files
apparently didn't succeed (for some reason as yet to be determined) it seems
simplest to just delete the old version from the gateway and republish. That
certainly should work.

BTW, the purpose of the --update option is to allow addition of new files to
an existing dataset, by listing only the new files. It should be equivalent
to running esgpublish using a complete listing, including old and new files,
without the --update option. Since files are being added, the default action
would be to create a new version of the dataset.

--Bob

On 1/9/12 3:15 AM, "stephen.pascoe at stfc.ac.uk" <stephen.pascoe at stfc.ac.uk>
wrote:

> Hi Bob,
>
> This "unpublish first" requirement is news to me.  We've been publishing new
> versions without doing this for some time.  Now, we have come across
> difficulties with a few datasets but it's generally worked.
>
> We don't use the --update option though.  Each time we publish a new version
> we provide a mapfile of all files in the dataset(s).  I'd recommend Sergey try
> doing this before removing a previous version.
>
> If you unpublish from the Gateway first you'll loose the information in the
> "History" tab.  For instance
> http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output2.MOHC.HadGEM2-ES.rcp85.mon.a
> erosol.aero.r1i1p1.html shows 2 versions.
>
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
> Behalf Of Drach, Bob
> Sent: 06 January 2012 20:53
> To: Serguei Nikonov; Eric Nienhouse
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Fwd: Re: Publishing dataset with option --update
>
> Hi Sergey,
>
> When updating a dataset, it's also important to unpublish it before publishing
> the new version. E.g, first run
>
> esgunpublish <dataset_id>
>
> The reason is that, when you publish to the gateway, the gateway software
> tries to *add* the new information to the existing dataset entry, rather that
> replace it.
>
> --Bob
> ________________________________________
> From: Serguei Nikonov [serguei.nikonov at noaa.gov]
> Sent: Friday, January 06, 2012 10:45 AM
> To: Eric Nienhouse
> Cc: Bob Drach; go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset with option --update
>
> Hi Eric,
>
> thanks for you help. I have no any objections about any adopted versioning
> policy. What I need is to know how to apply it. The ways I used did not work
> for
> me. Hopefully, the reasons is bad things in thredds and database you pointed
> put. I am cleaning them right now, then will see...
>
> Just for clarification, if I need to update dataset (with changing version) I
> create map file containing full set of files (old and new ones) and then use
> this map file in esgpublish script with option --update, is it correct? Will
> it
> be enough for creating dataset of new version? BTW, there is nothing about
> version for option 'update' in esgpublish help.
>
> Thanks,
> Sergey
>
>
>
> On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
>> Hi Serguei,
>>
>> Following are a few more suggestions to diagnose this publishing issue. I
>> agree
>> with others on this thread that adding new files (or changing existing ones)
>> should always trigger a new dataset version.
>>
>> It does not appear you are receiving a final "SUCCESS" or failure message
>> when
>> publishing to the Gateway (with esgpublish --publish). Please try increasing
>> your "polling" levels in your $ESGINI file. Eg:
>>
>> hessian_service_polling_delay = 10
>> hessian_service_polling_iterations = 500
>>
>> You should see a final "SUCCESS" or "ERROR" with Java trace output at the
>> termination of the command.
>>
>> I've reviewed the Thredds catalog for the dataset you note below:
>>
>>
>> http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM
>> 3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>>
>>
>> There appear to be multiple instances of certain files within the catalog
>> which
>> is a problem. The Gateway publish will fail if a particular file (URL) is
>> referenced multiple times with differing metadata. An example is:
>>
>>
>> */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601
>> /rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>>
>>
>> This file appears as two separate file versions in the Thredds catalog (one
>> with
>> id ending in ".nc" and another with ".nc_0"). There should be only one
>> reference
>> to this file URL in the catalog.
>>
>> The previous version of the dataset in the publisher/node database may be
>> leading to this issue. You may need to add "--database-delete" to your
>> esgunpublish command to clean things up. Bob can advise on this. Note that
>> the
>> original esgpublish command shown in this email thread included
>> "--keep-version".
>>
>> After publishing to the Gateway successfully, you can check the dataset
>> details
>> by URL with the published dataset identifier. For example:
>>
>>
>> http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.histor
>> ical.mon.atmos.Amon.r1i1p1.html
>>
>>
>> I hope this helps.
>>
>> Regards,
>>
>> -Eric
>>
>> Serguei Nikonov wrote:
>>> Hi Bob,
>>>
>>> I still can not do anything about updating datasets. The commands you
>>> suggested executed successfully but datasets did not appear on gateway. I
>>> tried it several times for different datasets but result is the same.
>>>
>>> Do you have any idea what to undertake in such situation.
>>>
>>> Here it is some details about what I tried.
>>> I needed to add file to dataset
>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
>>> As you advised I unpublished it (esgunpublish
>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then
>>> created full mapfile (with additional file) and then publised it:
>>> esgpublish --read-files --map new_mapfile --project cmip5 --thredd --publish
>>>
>>> As I told there were no any errors. Dataset is in database and in thredds
>>> but
>>> not in gateway.
>>>
>>> The second way I tried is using mapfile containing only files to update. I
>>> needed to substitute several existing files in dataset for new ones. I
>>> created
>>> mapfile with only files needed to substitute:
>>> esgscan_directory --read-files --project cmip5 -o mapfile.txt
>>> /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v2
>>> 0111206
>>>
>>> and then published it with update option:
>>> esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>>>
>>> The result is the same as in a previous case - all things are fine locally
>>> but
>>> nothing happened on gateway.
>>>
>>> Thanks,
>>> Sergey
>>>
>>> -------- Original Message --------
>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>> Date: Thu, 29 Dec 2011 11:02:05 -0500
>>> From: Serguei Nikonov <Serguei.Nikonov at noaa.gov>
>>> Organization: GFDL
>>> To: Drach, Bob <drach1 at llnl.gov>
>>> CC: Nathan Wilhelmi <wilhelmi at ucar.edu>, "Ganzberger, Michael"
>>> <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu" <go-essp-tech at ucar.edu>
>>>
>>> Hi Bob,
>>>
>>> I tried the 1st way you suggested and it worked partially - the dataset was
>>> created om datanode with version 2 but it was not popped up on gateway. To
>>> make
>>> sure that it's not occasional result I repeated it with another datasets
>>> with
>>> the same result.
>>> Now I have 2 datasets on datanode (visible in thredds server) but they are
>>> absent on gateway:
>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>>>
>>> Does it make sense to repeat esgpublish with 'publish' option?
>>>
>>> Thanks and Happy New Year,
>>> Sergey
>>>
>>> On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>>> Hi Sergey,
>>>>
>>>> The way I would recommend adding new files to an existing dataset is as
>>>> follows:
>>>>
>>>> - Unpublish the previous dataset from the gateway and thredds
>>>>
>>>> % esgunpublish
>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>
>>>> - Add the new files to the existing mapfile for the dataset they are being
>>>> added to.
>>>>
>>>> - Republish with the expanded mapfile:
>>>>
>>>> % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>>>> --publish
>>>>
>>>> The publisher will:
>>>> - not rescan existing files, only the new files
>>>> - create a new version to reflect the additional files
>>>>
>>>>
>>>> Alternatively you can create a mapfile with *only* the new files (Using
>>>> esgscan_directory), then republish using the --update command.
>>>>
>>>> --Bob
>>>>
>>>>
>>>> On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov> wrote:
>>>>
>>>>> Hi Nate,
>>>>>
>>>>> unfortunately this is not the only dataset I have a problem - there are at
>>>>> least
>>>>> 5 more. Should I unpublish them locally (db, thredds) and than create new
>>>>> version containing full set of files? What is the official way to update
>>>>> dataset?
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>>
>>>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>>> Hi Bob/Mike,
>>>>>>
>>>>>> I believe the problem is that when files were added the timestamp on the
>>>>>> dataset
>>>>>> wasn't updated.
>>>>>>
>>>>>> The triple store will only harvest datasets that have files and an
>>>>>> updated
>>>>>> timestamp after the last harvest.
>>>>>>
>>>>>> So what likely happened is the dataset was created without files, so it
>>>>>> wasn't
>>>>>> initially harvested. Files were subsequently added, but the timestamp
>>>>>> wasn't
>>>>>> updated, so it was still not a candidate for harvesting.
>>>>>>
>>>>>> Can you update the date_updated timestamp for the dataset in question and
>>>>>> then
>>>>>> trigger the RDF harvesting, I believe the dataset will show up then.
>>>>>>
>>>>>> Thanks!
>>>>>> -Nate
>>>>>>
>>>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>>> Hi Mike,
>>>>>>>
>>>>>>> I am a member of data publishers group. I have been publishing
>>>>>>> considerable
>>>>>>> amount of data without such kind of troubles but this one occurred only
>>>>>>> when
>>>>>>> I
>>>>>>> tried to add some files to existing dataset. Publishing from scratch
>>>>>>> works
>>>>>>> fine
>>>>>>> for me.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sergey
>>>>>>>
>>>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>>> Hi Serguei,
>>>>>>>>
>>>>>>>> That task is on a scheduler and will re-run every 10 minutes. If your
>>>>>>>> data
>>>>>>>> does not appear after that time then perhaps there is another issue.
>>>>>>>> One
>>>>>>>> issue could be that publishing to the gateway requires that you have
>>>>>>>> the
>>>>>>>> role
>>>>>>>> of "Data Publisher";
>>>>>>>>
>>>>>>>> "check that the account is member of the proper group and has the
>>>>>>>> special
>>>>>>>> role of Data Publisher"
>>>>>>>>
>>>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>>> To: Ganzberger, Michael
>>>>>>>> Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> thansk for suggestion but I don't have any privileges to do anything on
>>>>>>>> gateway.
>>>>>>>> I am just publishing data on GFDL data node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sergey
>>>>>>>>
>>>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>>> Hi Serguei,
>>>>>>>>>
>>>>>>>>> I'd like to suggest this that may help you from
>>>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>>>
>>>>>>>>> You have to manually trigger the 3store harvesting. Logging as root
>>>>>>>>> and go
>>>>>>>>> to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job
>>>>>>>>> named
>>>>>>>>> RDFSynchronizationJobDetail"
>>>>>>>>>
>>>>>>>>> Mike Ganzberger
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: go-essp-tech-bounces at ucar.edu
>>>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>>> On Behalf Of StИphane Senesi
>>>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>>> To: Serguei Nikonov
>>>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>> We have for some time now experienced similar problems when publishing
>>>>>>>>> to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>>>> publishing . Sometimes, files are actually published (or at least
>>>>>>>>> accessible through the gateway, their status being actually
>>>>>>>>> "START_PUBLISHING", after esg_list_datasets report) , sometimes not.
>>>>>>>>> An
>>>>>>>>> hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>>>> havn't yet got a confirmation by Bob.
>>>>>>>>>
>>>>>>>>> In contrast to your case, this happens when publishing a dataset from
>>>>>>>>> scratch (I mean, not an update)
>>>>>>>>>
>>>>>>>>> Best regards (do not expect any feeback from me since early january,
>>>>>>>>> yet)
>>>>>>>>>
>>>>>>>>> S
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>>> Hi Bob,
>>>>>>>>>>
>>>>>>>>>> I needed to add some missed variables to existing dataset and I found
>>>>>>>>>> in
>>>>>>>>>> esgpublish command an option --update. When I tried it I've got
>>>>>>>>>> normal
>>>>>>>>>> message like
>>>>>>>>>> INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1,
>>>>>>>>>> parent
>>>>>>>>>> =
>>>>>>>>>> pcmdi.GFDL
>>>>>>>>>> INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>>> INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>>> ....
>>>>>>>>>>
>>>>>>>>>> but nothing happened on gateway - new variables are not there. The
>>>>>>>>>> files
>>>>>>>>>> corresponding to these variables are in database and in THREDDS
>>>>>>>>>> catalog
>>>>>>>>>> but
>>>>>>>>>> apparently were not published on gateway.
>>>>>>>>>>
>>>>>>>>>> I used command line
>>>>>>>>>> esgpublish --update --keep-version --map<map_file> --project cmip5
>>>>>>>>>> --noscan
>>>>>>>>>> --publish.
>>>>>>>>>>
>>>>>>>>>> Should map file be of some specific format to make it works in mode I
>>>>>>>>>> need?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sergey Nikonov
>>>>>>>>>> GFDL
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> GO-ESSP-TECH mailing list
>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>> _______________________________________________
>>> GO-ESSP-TECH mailing list
>>> GO-ESSP-TECH at ucar.edu
>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>>
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> --
> Scanned by iCritical.



More information about the GO-ESSP-TECH mailing list