[Go-essp-tech] Fwd: Re: Publishing dataset with option --update

Eric Nienhouse ejn at ucar.edu
Wed Jan 4 14:27:25 MST 2012


Hi Serguei,

Following are a few more suggestions to diagnose this publishing issue.  
I agree with others on this thread that adding new files (or changing 
existing ones) should always trigger a new dataset version.

It does not appear you are receiving a final "SUCCESS" or failure 
message when publishing to the Gateway (with esgpublish --publish).  
Please try increasing your "polling" levels in your $ESGINI file.  Eg:

hessian_service_polling_delay = 10
hessian_service_polling_iterations = 500

You should see a final "SUCCESS" or "ERROR" with Java trace output at 
the termination of the command.

I've reviewed the Thredds catalog for the dataset you note below:

  
http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2.xml

There appear to be multiple instances of certain files within the 
catalog which is a problem.  The Gateway publish will fail if a 
particular file (URL) is referenced multiple times with differing 
metadata.  An example is:

  
*/gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc

This file appears as two separate file versions in the Thredds catalog 
(one with id ending in ".nc" and another with ".nc_0").  There should be 
only one reference to this file URL in the catalog.

The previous version of the dataset in the publisher/node database may 
be leading to this issue.  You may need to add "--database-delete" to 
your esgunpublish command to clean things up.  Bob can advise on this.  
Note that the original esgpublish command shown in this email thread 
included "--keep-version".

After publishing to the Gateway successfully, you can check the dataset 
details by URL with the published dataset identifier.  For example:

  
http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.html

I hope this helps.

Regards,

-Eric

Serguei Nikonov wrote:
> Hi Bob,
>
> I still can not do anything about updating datasets. The commands you suggested 
> executed successfully but datasets did not appear on gateway. I tried it several 
> times for different datasets but result is the same.
>
> Do you have any idea what to undertake in such situation.
>
> Here it is some details about what I tried.
> I needed to add file to dataset 
> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
> As you advised I unpublished it (esgunpublish 
> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then 
> created full mapfile (with additional file) and then publised it:
> esgpublish --read-files --map new_mapfile --project cmip5 --thredd --publish
>
> As I told there were no any errors. Dataset is in database and in thredds but 
> not in gateway.
>
> The second way I tried is using mapfile containing only files to update. I 
> needed to substitute several existing files in dataset for new ones. I created 
> mapfile with only files needed to substitute:
> esgscan_directory --read-files --project cmip5 -o mapfile.txt 
> /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v20111206
> and then published it with update option:
> esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>
> The result is the same as in a previous case - all things are fine locally but 
> nothing happened on gateway.
>
> Thanks,
> Sergey
>
> -------- Original Message --------
> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
> Date: Thu, 29 Dec 2011 11:02:05 -0500
> From: Serguei Nikonov <Serguei.Nikonov at noaa.gov>
> Organization: GFDL
> To: Drach, Bob <drach1 at llnl.gov>
> CC: Nathan Wilhelmi <wilhelmi at ucar.edu>,  "Ganzberger, Michael" 
> <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu" <go-essp-tech at ucar.edu>
>
> Hi Bob,
>
> I tried the 1st way you suggested and it worked partially - the dataset was
> created om datanode with version 2 but it was not popped up on gateway. To make
> sure that it's not occasional result I repeated it with another datasets with
> the same result.
> Now I have 2 datasets on datanode (visible in thredds server) but they are
> absent on gateway:
> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>
> Does it make sense to repeat esgpublish with 'publish' option?
>
> Thanks and Happy New Year,
> Sergey
>
> On 12/21/2011 08:41 PM, Drach, Bob wrote:
>   
>> Hi Sergey,
>>
>> The way I would recommend adding new files to an existing dataset is as
>> follows:
>>
>> - Unpublish the previous dataset from the gateway and thredds
>>
>> % esgunpublish
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>
>> - Add the new files to the existing mapfile for the dataset they are being
>> added to.
>>
>> - Republish with the expanded mapfile:
>>
>> % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>> --publish
>>
>> The publisher will:
>> - not rescan existing files, only the new files
>> - create a new version to reflect the additional files
>>
>>
>> Alternatively you can create a mapfile with *only* the new files (Using
>> esgscan_directory), then republish using the --update command.
>>
>> --Bob
>>
>>
>> On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov>  wrote:
>>
>>     
>>> Hi Nate,
>>>
>>> unfortunately this is not the only dataset I have a problem - there are at
>>> least
>>> 5 more. Should I unpublish them locally (db, thredds) and than create new
>>> version containing full set of files? What is the official way to update
>>> dataset?
>>>
>>> Thanks,
>>> Sergey
>>>
>>>
>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>       
>>>> Hi Bob/Mike,
>>>>
>>>> I believe the problem is that when files were added the timestamp on the
>>>> dataset
>>>> wasn't updated.
>>>>
>>>> The triple store will only harvest datasets that have files and an updated
>>>> timestamp after the last harvest.
>>>>
>>>> So what likely happened is the dataset was created without files, so it
>>>> wasn't
>>>> initially harvested. Files were subsequently added, but the timestamp wasn't
>>>> updated, so it was still not a candidate for harvesting.
>>>>
>>>> Can you update the date_updated timestamp for the dataset in question and
>>>> then
>>>> trigger the RDF harvesting, I believe the dataset will show up then.
>>>>
>>>> Thanks!
>>>> -Nate
>>>>
>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>         
>>>>> Hi Mike,
>>>>>
>>>>> I am a member of data publishers group. I have been publishing considerable
>>>>> amount of data without such kind of troubles but this one occurred only when
>>>>> I
>>>>> tried to add some files to existing dataset. Publishing from scratch works
>>>>> fine
>>>>> for me.
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>           
>>>>>> Hi Serguei,
>>>>>>
>>>>>> That task is on a scheduler and will re-run every 10 minutes. If your data
>>>>>> does not appear after that time then perhaps there is another issue. One
>>>>>> issue could be that publishing to the gateway requires that you have the
>>>>>> role
>>>>>> of "Data Publisher";
>>>>>>
>>>>>> "check that the account is member of the proper group and has the special
>>>>>> role of Data Publisher"
>>>>>>
>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>> To: Ganzberger, Michael
>>>>>> Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> thansk for suggestion but I don't have any privileges to do anything on
>>>>>> gateway.
>>>>>> I am just publishing data on GFDL data node.
>>>>>>
>>>>>> Regards,
>>>>>> Sergey
>>>>>>
>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>             
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> I'd like to suggest this that may help you from
>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>
>>>>>>> You have to manually trigger the 3store harvesting. Logging as root and go
>>>>>>> to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job named
>>>>>>> RDFSynchronizationJobDetail"
>>>>>>>
>>>>>>> Mike Ganzberger
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>> On Behalf Of StИphane Senesi
>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>> To: Serguei Nikonov
>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>
>>>>>>> Serguei
>>>>>>>
>>>>>>> We have for some time now experienced similar problems when publishing
>>>>>>> to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>> publishing . Sometimes, files are actually published (or at least
>>>>>>> accessible through the gateway, their status being actually
>>>>>>> "START_PUBLISHING", after esg_list_datasets report) , sometimes not. An
>>>>>>> hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>> havn't yet got a confirmation by Bob.
>>>>>>>
>>>>>>> In contrast to your case, this happens when publishing a dataset from
>>>>>>> scratch (I mean, not an update)
>>>>>>>
>>>>>>> Best regards (do not expect any feeback from me since early january, yet)
>>>>>>>
>>>>>>> S
>>>>>>>
>>>>>>>
>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>               
>>>>>>>> Hi Bob,
>>>>>>>>
>>>>>>>> I needed to add some missed variables to existing dataset and I found in
>>>>>>>> esgpublish command an option --update. When I tried it I've got normal
>>>>>>>> message like
>>>>>>>> INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1, parent
>>>>>>>> =
>>>>>>>> pcmdi.GFDL
>>>>>>>> INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>> INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>> ....
>>>>>>>>
>>>>>>>> but nothing happened on gateway - new variables are not there. The files
>>>>>>>> corresponding to these variables are in database and in THREDDS catalog
>>>>>>>> but
>>>>>>>> apparently were not published on gateway.
>>>>>>>>
>>>>>>>> I used command line
>>>>>>>> esgpublish --update --keep-version --map<map_file>  --project cmip5
>>>>>>>> --noscan
>>>>>>>> --publish.
>>>>>>>>
>>>>>>>> Should map file be of some specific format to make it works in mode I
>>>>>>>> need?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sergey Nikonov
>>>>>>>> GFDL
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>           
>>>>         
>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>   



More information about the GO-ESSP-TECH mailing list