[Go-essp-tech] Fwd: Re: Publishing dataset with option --update

stephen.pascoe at stfc.ac.uk stephen.pascoe at stfc.ac.uk
Mon Jan 9 04:15:02 MST 2012


Hi Bob,

This "unpublish first" requirement is news to me.  We've been publishing new versions without doing this for some time.  Now, we have come across difficulties with a few datasets but it's generally worked.

We don't use the --update option though.  Each time we publish a new version we provide a mapfile of all files in the dataset(s).  I'd recommend Sergey try doing this before removing a previous version.

If you unpublish from the Gateway first you'll loose the information in the "History" tab.  For instance http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output2.MOHC.HadGEM2-ES.rcp85.mon.aerosol.aero.r1i1p1.html shows 2 versions.

Stephen.

---
Stephen Pascoe  +44 (0)1235 445980
Centre of Environmental Data Archival
STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK


-----Original Message-----
From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
Sent: 06 January 2012 20:53
To: Serguei Nikonov; Eric Nienhouse
Cc: go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Fwd: Re: Publishing dataset with option --update

Hi Sergey,

When updating a dataset, it's also important to unpublish it before publishing the new version. E.g, first run

esgunpublish <dataset_id>

The reason is that, when you publish to the gateway, the gateway software tries to *add* the new information to the existing dataset entry, rather that replace it.

--Bob
________________________________________
From: Serguei Nikonov [serguei.nikonov at noaa.gov]
Sent: Friday, January 06, 2012 10:45 AM
To: Eric Nienhouse
Cc: Bob Drach; go-essp-tech at ucar.edu
Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset with option --update

Hi Eric,

thanks for you help. I have no any objections about any adopted versioning
policy. What I need is to know how to apply it. The ways I used did not work for
me. Hopefully, the reasons is bad things in thredds and database you pointed
put. I am cleaning them right now, then will see...

Just for clarification, if I need to update dataset (with changing version) I
create map file containing full set of files (old and new ones) and then use
this map file in esgpublish script with option --update, is it correct? Will it
be enough for creating dataset of new version? BTW, there is nothing about
version for option 'update' in esgpublish help.

Thanks,
Sergey



On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
> Hi Serguei,
>
> Following are a few more suggestions to diagnose this publishing issue. I agree
> with others on this thread that adding new files (or changing existing ones)
> should always trigger a new dataset version.
>
> It does not appear you are receiving a final "SUCCESS" or failure message when
> publishing to the Gateway (with esgpublish --publish). Please try increasing
> your "polling" levels in your $ESGINI file. Eg:
>
> hessian_service_polling_delay = 10
> hessian_service_polling_iterations = 500
>
> You should see a final "SUCCESS" or "ERROR" with Java trace output at the
> termination of the command.
>
> I've reviewed the Thredds catalog for the dataset you note below:
>
>
> http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>
>
> There appear to be multiple instances of certain files within the catalog which
> is a problem. The Gateway publish will fail if a particular file (URL) is
> referenced multiple times with differing metadata. An example is:
>
>
> */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>
>
> This file appears as two separate file versions in the Thredds catalog (one with
> id ending in ".nc" and another with ".nc_0"). There should be only one reference
> to this file URL in the catalog.
>
> The previous version of the dataset in the publisher/node database may be
> leading to this issue. You may need to add "--database-delete" to your
> esgunpublish command to clean things up. Bob can advise on this. Note that the
> original esgpublish command shown in this email thread included "--keep-version".
>
> After publishing to the Gateway successfully, you can check the dataset details
> by URL with the published dataset identifier. For example:
>
>
> http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.html
>
>
> I hope this helps.
>
> Regards,
>
> -Eric
>
> Serguei Nikonov wrote:
>> Hi Bob,
>>
>> I still can not do anything about updating datasets. The commands you
>> suggested executed successfully but datasets did not appear on gateway. I
>> tried it several times for different datasets but result is the same.
>>
>> Do you have any idea what to undertake in such situation.
>>
>> Here it is some details about what I tried.
>> I needed to add file to dataset
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
>> As you advised I unpublished it (esgunpublish
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then
>> created full mapfile (with additional file) and then publised it:
>> esgpublish --read-files --map new_mapfile --project cmip5 --thredd --publish
>>
>> As I told there were no any errors. Dataset is in database and in thredds but
>> not in gateway.
>>
>> The second way I tried is using mapfile containing only files to update. I
>> needed to substitute several existing files in dataset for new ones. I created
>> mapfile with only files needed to substitute:
>> esgscan_directory --read-files --project cmip5 -o mapfile.txt
>> /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v20111206
>>
>> and then published it with update option:
>> esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>>
>> The result is the same as in a previous case - all things are fine locally but
>> nothing happened on gateway.
>>
>> Thanks,
>> Sergey
>>
>> -------- Original Message --------
>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>> Date: Thu, 29 Dec 2011 11:02:05 -0500
>> From: Serguei Nikonov <Serguei.Nikonov at noaa.gov>
>> Organization: GFDL
>> To: Drach, Bob <drach1 at llnl.gov>
>> CC: Nathan Wilhelmi <wilhelmi at ucar.edu>, "Ganzberger, Michael"
>> <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu" <go-essp-tech at ucar.edu>
>>
>> Hi Bob,
>>
>> I tried the 1st way you suggested and it worked partially - the dataset was
>> created om datanode with version 2 but it was not popped up on gateway. To make
>> sure that it's not occasional result I repeated it with another datasets with
>> the same result.
>> Now I have 2 datasets on datanode (visible in thredds server) but they are
>> absent on gateway:
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>>
>> Does it make sense to repeat esgpublish with 'publish' option?
>>
>> Thanks and Happy New Year,
>> Sergey
>>
>> On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>> Hi Sergey,
>>>
>>> The way I would recommend adding new files to an existing dataset is as
>>> follows:
>>>
>>> - Unpublish the previous dataset from the gateway and thredds
>>>
>>> % esgunpublish
>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>
>>> - Add the new files to the existing mapfile for the dataset they are being
>>> added to.
>>>
>>> - Republish with the expanded mapfile:
>>>
>>> % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>>> --publish
>>>
>>> The publisher will:
>>> - not rescan existing files, only the new files
>>> - create a new version to reflect the additional files
>>>
>>>
>>> Alternatively you can create a mapfile with *only* the new files (Using
>>> esgscan_directory), then republish using the --update command.
>>>
>>> --Bob
>>>
>>>
>>> On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov> wrote:
>>>
>>>> Hi Nate,
>>>>
>>>> unfortunately this is not the only dataset I have a problem - there are at
>>>> least
>>>> 5 more. Should I unpublish them locally (db, thredds) and than create new
>>>> version containing full set of files? What is the official way to update
>>>> dataset?
>>>>
>>>> Thanks,
>>>> Sergey
>>>>
>>>>
>>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>> Hi Bob/Mike,
>>>>>
>>>>> I believe the problem is that when files were added the timestamp on the
>>>>> dataset
>>>>> wasn't updated.
>>>>>
>>>>> The triple store will only harvest datasets that have files and an updated
>>>>> timestamp after the last harvest.
>>>>>
>>>>> So what likely happened is the dataset was created without files, so it
>>>>> wasn't
>>>>> initially harvested. Files were subsequently added, but the timestamp wasn't
>>>>> updated, so it was still not a candidate for harvesting.
>>>>>
>>>>> Can you update the date_updated timestamp for the dataset in question and
>>>>> then
>>>>> trigger the RDF harvesting, I believe the dataset will show up then.
>>>>>
>>>>> Thanks!
>>>>> -Nate
>>>>>
>>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>> Hi Mike,
>>>>>>
>>>>>> I am a member of data publishers group. I have been publishing considerable
>>>>>> amount of data without such kind of troubles but this one occurred only when
>>>>>> I
>>>>>> tried to add some files to existing dataset. Publishing from scratch works
>>>>>> fine
>>>>>> for me.
>>>>>>
>>>>>> Thanks,
>>>>>> Sergey
>>>>>>
>>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> That task is on a scheduler and will re-run every 10 minutes. If your data
>>>>>>> does not appear after that time then perhaps there is another issue. One
>>>>>>> issue could be that publishing to the gateway requires that you have the
>>>>>>> role
>>>>>>> of "Data Publisher";
>>>>>>>
>>>>>>> "check that the account is member of the proper group and has the special
>>>>>>> role of Data Publisher"
>>>>>>>
>>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>> To: Ganzberger, Michael
>>>>>>> Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>
>>>>>>> Hi Mike,
>>>>>>>
>>>>>>> thansk for suggestion but I don't have any privileges to do anything on
>>>>>>> gateway.
>>>>>>> I am just publishing data on GFDL data node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sergey
>>>>>>>
>>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>> Hi Serguei,
>>>>>>>>
>>>>>>>> I'd like to suggest this that may help you from
>>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>>
>>>>>>>> You have to manually trigger the 3store harvesting. Logging as root and go
>>>>>>>> to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job named
>>>>>>>> RDFSynchronizationJobDetail"
>>>>>>>>
>>>>>>>> Mike Ganzberger
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>> On Behalf Of StИphane Senesi
>>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>> To: Serguei Nikonov
>>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>> We have for some time now experienced similar problems when publishing
>>>>>>>> to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>>> publishing . Sometimes, files are actually published (or at least
>>>>>>>> accessible through the gateway, their status being actually
>>>>>>>> "START_PUBLISHING", after esg_list_datasets report) , sometimes not. An
>>>>>>>> hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>>> havn't yet got a confirmation by Bob.
>>>>>>>>
>>>>>>>> In contrast to your case, this happens when publishing a dataset from
>>>>>>>> scratch (I mean, not an update)
>>>>>>>>
>>>>>>>> Best regards (do not expect any feeback from me since early january, yet)
>>>>>>>>
>>>>>>>> S
>>>>>>>>
>>>>>>>>
>>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>> Hi Bob,
>>>>>>>>>
>>>>>>>>> I needed to add some missed variables to existing dataset and I found in
>>>>>>>>> esgpublish command an option --update. When I tried it I've got normal
>>>>>>>>> message like
>>>>>>>>> INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1, parent
>>>>>>>>> =
>>>>>>>>> pcmdi.GFDL
>>>>>>>>> INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>> INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>>> but nothing happened on gateway - new variables are not there. The files
>>>>>>>>> corresponding to these variables are in database and in THREDDS catalog
>>>>>>>>> but
>>>>>>>>> apparently were not published on gateway.
>>>>>>>>>
>>>>>>>>> I used command line
>>>>>>>>> esgpublish --update --keep-version --map<map_file> --project cmip5
>>>>>>>>> --noscan
>>>>>>>>> --publish.
>>>>>>>>>
>>>>>>>>> Should map file be of some specific format to make it works in mode I
>>>>>>>>> need?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sergey Nikonov
>>>>>>>>> GFDL
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>
>>>>>>>>>
>>>>>> _______________________________________________
>>>>>> GO-ESSP-TECH mailing list
>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>

_______________________________________________
GO-ESSP-TECH mailing list
GO-ESSP-TECH at ucar.edu
http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
-- 
Scanned by iCritical.


More information about the GO-ESSP-TECH mailing list