[Go-essp-tech] Fwd: Re: Publishing dataset with option --update

Drach, Bob drach1 at llnl.gov
Wed Jan 11 13:50:11 MST 2012


Hi Sergey,

I'll check. Sometimes if the gateway is heavily utilized (as it often is) it
takes a while. If they still don't show up, I have had some success in
'forcing' the harvesting on our side.

Regards,

--Bob


On 1/11/12 6:41 AM, "Serguei Nikonov" <serguei.nikonov at noaa.gov> wrote:

> Hi Bob,
>
> yesterday some strange things happened when I unpublished existing datasets
> and
> then published them with 'update' option. I processed 5 datasets
> (cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r{1,2,3,4,5}i1p1)
> in
> one pool and they all went through the publishing process with SUCCESSFUL
> message. But only r1,r2 are visible in gateway GUI. Others can be reachable
> only
> through direct link (e.g.,
> http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.histori
> cal.mon.atmos.Amon.r3i1p1.html).
>
>
> Thanks,
> Sergey
>
> On 01/06/2012 03:53 PM, Drach, Bob wrote:
>> Hi Sergey,
>>
>> When updating a dataset, it's also important to unpublish it before
>> publishing the new version. E.g, first run
>>
>> esgunpublish<dataset_id>
>>
>> The reason is that, when you publish to the gateway, the gateway software
>> tries to *add* the new information to the existing dataset entry, rather that
>> replace it.
>>
>> --Bob
>> ________________________________________
>> From: Serguei Nikonov [serguei.nikonov at noaa.gov]
>> Sent: Friday, January 06, 2012 10:45 AM
>> To: Eric Nienhouse
>> Cc: Bob Drach; go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset with option --update
>>
>> Hi Eric,
>>
>> thanks for you help. I have no any objections about any adopted versioning
>> policy. What I need is to know how to apply it. The ways I used did not work
>> for
>> me. Hopefully, the reasons is bad things in thredds and database you pointed
>> put. I am cleaning them right now, then will see...
>>
>> Just for clarification, if I need to update dataset (with changing version) I
>> create map file containing full set of files (old and new ones) and then use
>> this map file in esgpublish script with option --update, is it correct? Will
>> it
>> be enough for creating dataset of new version? BTW, there is nothing about
>> version for option 'update' in esgpublish help.
>>
>> Thanks,
>> Sergey
>>
>>
>>
>> On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
>>> Hi Serguei,
>>>
>>> Following are a few more suggestions to diagnose this publishing issue. I
>>> agree
>>> with others on this thread that adding new files (or changing existing ones)
>>> should always trigger a new dataset version.
>>>
>>> It does not appear you are receiving a final "SUCCESS" or failure message
>>> when
>>> publishing to the Gateway (with esgpublish --publish). Please try increasing
>>> your "polling" levels in your $ESGINI file. Eg:
>>>
>>> hessian_service_polling_delay = 10
>>> hessian_service_polling_iterations = 500
>>>
>>> You should see a final "SUCCESS" or "ERROR" with Java trace output at the
>>> termination of the command.
>>>
>>> I've reviewed the Thredds catalog for the dataset you note below:
>>>
>>>
>>> http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-C
>>> M3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>>>
>>>
>>> There appear to be multiple instances of certain files within the catalog
>>> which
>>> is a problem. The Gateway publish will fail if a particular file (URL) is
>>> referenced multiple times with differing metadata. An example is:
>>>
>>>
>>> */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v2011060
>>> 1/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>>>
>>>
>>> This file appears as two separate file versions in the Thredds catalog (one
>>> with
>>> id ending in ".nc" and another with ".nc_0"). There should be only one
>>> reference
>>> to this file URL in the catalog.
>>>
>>> The previous version of the dataset in the publisher/node database may be
>>> leading to this issue. You may need to add "--database-delete" to your
>>> esgunpublish command to clean things up. Bob can advise on this. Note that
>>> the
>>> original esgpublish command shown in this email thread included
>>> "--keep-version".
>>>
>>> After publishing to the Gateway successfully, you can check the dataset
>>> details
>>> by URL with the published dataset identifier. For example:
>>>
>>>
>>> http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.histo
>>> rical.mon.atmos.Amon.r1i1p1.html
>>>
>>>
>>> I hope this helps.
>>>
>>> Regards,
>>>
>>> -Eric
>>>
>>> Serguei Nikonov wrote:
>>>> Hi Bob,
>>>>
>>>> I still can not do anything about updating datasets. The commands you
>>>> suggested executed successfully but datasets did not appear on gateway. I
>>>> tried it several times for different datasets but result is the same.
>>>>
>>>> Do you have any idea what to undertake in such situation.
>>>>
>>>> Here it is some details about what I tried.
>>>> I needed to add file to dataset
>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
>>>> As you advised I unpublished it (esgunpublish
>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then
>>>> created full mapfile (with additional file) and then publised it:
>>>> esgpublish --read-files --map new_mapfile --project cmip5 --thredd
>>>> --publish
>>>>
>>>> As I told there were no any errors. Dataset is in database and in thredds
>>>> but
>>>> not in gateway.
>>>>
>>>> The second way I tried is using mapfile containing only files to update. I
>>>> needed to substitute several existing files in dataset for new ones. I
>>>> created
>>>> mapfile with only files needed to substitute:
>>>> esgscan_directory --read-files --project cmip5 -o mapfile.txt
>>>> /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v
>>>> 20111206
>>>>
>>>> and then published it with update option:
>>>> esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>>>>
>>>> The result is the same as in a previous case - all things are fine locally
>>>> but
>>>> nothing happened on gateway.
>>>>
>>>> Thanks,
>>>> Sergey
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>> Date: Thu, 29 Dec 2011 11:02:05 -0500
>>>> From: Serguei Nikonov<Serguei.Nikonov at noaa.gov>
>>>> Organization: GFDL
>>>> To: Drach, Bob<drach1 at llnl.gov>
>>>> CC: Nathan Wilhelmi<wilhelmi at ucar.edu>, "Ganzberger, Michael"
>>>> <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu"<go-essp-tech at ucar.edu>
>>>>
>>>> Hi Bob,
>>>>
>>>> I tried the 1st way you suggested and it worked partially - the dataset was
>>>> created om datanode with version 2 but it was not popped up on gateway. To
>>>> make
>>>> sure that it's not occasional result I repeated it with another datasets
>>>> with
>>>> the same result.
>>>> Now I have 2 datasets on datanode (visible in thredds server) but they are
>>>> absent on gateway:
>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>>>>
>>>> Does it make sense to repeat esgpublish with 'publish' option?
>>>>
>>>> Thanks and Happy New Year,
>>>> Sergey
>>>>
>>>> On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>>>> Hi Sergey,
>>>>>
>>>>> The way I would recommend adding new files to an existing dataset is as
>>>>> follows:
>>>>>
>>>>> - Unpublish the previous dataset from the gateway and thredds
>>>>>
>>>>> % esgunpublish
>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>>
>>>>> - Add the new files to the existing mapfile for the dataset they are being
>>>>> added to.
>>>>>
>>>>> - Republish with the expanded mapfile:
>>>>>
>>>>> % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>>>>> --publish
>>>>>
>>>>> The publisher will:
>>>>> - not rescan existing files, only the new files
>>>>> - create a new version to reflect the additional files
>>>>>
>>>>>
>>>>> Alternatively you can create a mapfile with *only* the new files (Using
>>>>> esgscan_directory), then republish using the --update command.
>>>>>
>>>>> --Bob
>>>>>
>>>>>
>>>>> On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov>  wrote:
>>>>>
>>>>>> Hi Nate,
>>>>>>
>>>>>> unfortunately this is not the only dataset I have a problem - there are
>>>>>> at
>>>>>> least
>>>>>> 5 more. Should I unpublish them locally (db, thredds) and than create new
>>>>>> version containing full set of files? What is the official way to update
>>>>>> dataset?
>>>>>>
>>>>>> Thanks,
>>>>>> Sergey
>>>>>>
>>>>>>
>>>>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>>>> Hi Bob/Mike,
>>>>>>>
>>>>>>> I believe the problem is that when files were added the timestamp on the
>>>>>>> dataset
>>>>>>> wasn't updated.
>>>>>>>
>>>>>>> The triple store will only harvest datasets that have files and an
>>>>>>> updated
>>>>>>> timestamp after the last harvest.
>>>>>>>
>>>>>>> So what likely happened is the dataset was created without files, so it
>>>>>>> wasn't
>>>>>>> initially harvested. Files were subsequently added, but the timestamp
>>>>>>> wasn't
>>>>>>> updated, so it was still not a candidate for harvesting.
>>>>>>>
>>>>>>> Can you update the date_updated timestamp for the dataset in question
>>>>>>> and
>>>>>>> then
>>>>>>> trigger the RDF harvesting, I believe the dataset will show up then.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> -Nate
>>>>>>>
>>>>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>>>> Hi Mike,
>>>>>>>>
>>>>>>>> I am a member of data publishers group. I have been publishing
>>>>>>>> considerable
>>>>>>>> amount of data without such kind of troubles but this one occurred only
>>>>>>>> when
>>>>>>>> I
>>>>>>>> tried to add some files to existing dataset. Publishing from scratch
>>>>>>>> works
>>>>>>>> fine
>>>>>>>> for me.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sergey
>>>>>>>>
>>>>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>>>> Hi Serguei,
>>>>>>>>>
>>>>>>>>> That task is on a scheduler and will re-run every 10 minutes. If your
>>>>>>>>> data
>>>>>>>>> does not appear after that time then perhaps there is another issue.
>>>>>>>>> One
>>>>>>>>> issue could be that publishing to the gateway requires that you have
>>>>>>>>> the
>>>>>>>>> role
>>>>>>>>> of "Data Publisher";
>>>>>>>>>
>>>>>>>>> "check that the account is member of the proper group and has the
>>>>>>>>> special
>>>>>>>>> role of Data Publisher"
>>>>>>>>>
>>>>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>>>> To: Ganzberger, Michael
>>>>>>>>> Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>
>>>>>>>>> Hi Mike,
>>>>>>>>>
>>>>>>>>> thansk for suggestion but I don't have any privileges to do anything
>>>>>>>>> on
>>>>>>>>> gateway.
>>>>>>>>> I am just publishing data on GFDL data node.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Sergey
>>>>>>>>>
>>>>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>>>> Hi Serguei,
>>>>>>>>>>
>>>>>>>>>> I'd like to suggest this that may help you from
>>>>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>>>>
>>>>>>>>>> You have to manually trigger the 3store harvesting. Logging as root
>>>>>>>>>> and go
>>>>>>>>>> to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job
>>>>>>>>>> named
>>>>>>>>>> RDFSynchronizationJobDetail"
>>>>>>>>>>
>>>>>>>>>> Mike Ganzberger
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: go-essp-tech-bounces at ucar.edu
>>>>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>>>> On Behalf Of StИphane Senesi
>>>>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>>>> To: Serguei Nikonov
>>>>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>>
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>> We have for some time now experienced similar problems when
>>>>>>>>>> publishing
>>>>>>>>>> to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>>>>> publishing . Sometimes, files are actually published (or at least
>>>>>>>>>> accessible through the gateway, their status being actually
>>>>>>>>>> "START_PUBLISHING", after esg_list_datasets report) , sometimes not.
>>>>>>>>>> An
>>>>>>>>>> hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>>>>> havn't yet got a confirmation by Bob.
>>>>>>>>>>
>>>>>>>>>> In contrast to your case, this happens when publishing a dataset from
>>>>>>>>>> scratch (I mean, not an update)
>>>>>>>>>>
>>>>>>>>>> Best regards (do not expect any feeback from me since early january,
>>>>>>>>>> yet)
>>>>>>>>>>
>>>>>>>>>> S
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>>>> Hi Bob,
>>>>>>>>>>>
>>>>>>>>>>> I needed to add some missed variables to existing dataset and I
>>>>>>>>>>> found in
>>>>>>>>>>> esgpublish command an option --update. When I tried it I've got
>>>>>>>>>>> normal
>>>>>>>>>>> message like
>>>>>>>>>>> INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>>>> cmip5http://www.wolframalpha.com/input/?i=integrate+sin%28x%29+from+
>>>>>>>>>>> 0+to+pi.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1,
>>>>>>>>>>> parent
>>>>>>>>>>> =
>>>>>>>>>>> pcmdi.GFDL
>>>>>>>>>>> INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>>>> INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>> but nothing happened on gateway - new variables are not there. The
>>>>>>>>>>> files
>>>>>>>>>>> corresponding to these variables are in database and in THREDDS
>>>>>>>>>>> catalog
>>>>>>>>>>> but
>>>>>>>>>>> apparently were not published on gateway.
>>>>>>>>>>>
>>>>>>>>>>> I used command line
>>>>>>>>>>> esgpublish --update --keep-version --map<map_file>  --project cmip5
>>>>>>>>>>> --noscan
>>>>>>>>>>> --publish.
>>>>>>>>>>>
>>>>>>>>>>> Should map file be of some specific format to make it works in mode
>>>>>>>>>>> I
>>>>>>>>>>> need?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sergey Nikonov
>>>>>>>>>>> GFDL
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>> _______________________________________________
>>>> GO-ESSP-TECH mailing list
>>>> GO-ESSP-TECH at ucar.edu
>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>>
>>
>>
>



More information about the GO-ESSP-TECH mailing list