[Go-essp-tech] Publishing dataset with option --update

George Huffman huffman at agnes.gsfc.nasa.gov
Fri Dec 30 12:41:17 MST 2011


Stephen, Jamie, and all - as a humble data producer I normally only lurk 
on this list, but ...

The observational dataset that I'm providing for CMIP5 (TRMM 
Multi-satellite Precipitation Analysis) is active in its source 
location, with additional data being pumped out once a month.  This 
thread raises the interesting question of how to accommodate these 
additional months of data over on the Gateway framework.  For the 
specific case of CMIP5 it might be sufficient to deliver a "once and for 
all" static data set that is essentially historical, but I'm sure there 
will soon be other users interested in getting the subsequent months of 
data for other studies.  How might these be added gracefully? 
Republishing the entire dataset in order to add one more month or even 
one more year seems awkward for us, and somewhat confusing for the 
users, who would be confronted by escalating version numbers that 
indicate nothing more than changes in the length of the data record.

Have a safe and happy New Year,
George

On 12/30/11 2:12 PM, stephen.pascoe at stfc.ac.uk wrote:
> Hi Jamie,
>
> My understanding was that we had agreed that once a dataset version had been published (i.e. is available at a Gateway) no files would be added/deleted/changed in that version, any changes to the dataset would trigger a new version.  This is the only sensible way of having versions at all and breaking this rule means users can't be confident that their version is consistent with someone else's at the same version number.
>
> I'm sure you already knew this was BADC's position and want to clarify what other centres understand by the versioning rules.  No-one has ever contradicted my understanding in emails or telcos but in the end it is up to individual datanode administrators to keep to these rules.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe  +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Kettleborough, Jamie
> Sent: 22 December 2011 09:24
> To: Drach, Bob; Serguei Nikonov; Nathan Wilhelmi
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>
> Hello Karl, Bob,
>
> Sorry to labour this, but can you clarify (I don't know enough about map files and esgpublish to know the answer).  Do you expect addition of new files to a currently published data set to trigger a new DRS publication version (so the vYYYYMMDD bit changes in the DRS)?
>
> If not can you clarify under what circumstances you expect data publishers to generater new DRS publication versions, and when its OK for them to update a current version.  [I think you - users, data providers, data node admins etc - get a better view of the history of the dataset if you always update the version - which I think ties in with what Ashish said.  Even if this isn't ESG policy I think it is a good policy for CMIP5.]
>
> I *suspect* there may be a communication issue here and not everyone has the same understanding of what should happen.
>
> Thanks,
>
> Jamie
>
>
>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
>> Sent: 22 December 2011 01:42
>> To: Serguei Nikonov; Nathan Wilhelmi
>> Cc: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>
>> Hi Sergey,
>>
>> The way I would recommend adding new files to an existing
>> dataset is as
>> follows:
>>
>> - Unpublish the previous dataset from the gateway and thredds
>>
>> % esgunpublish
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>
>> - Add the new files to the existing mapfile for the dataset
>> they are being added to.
>>
>> - Republish with the expanded mapfile:
>>
>> % esgpublish --read-files --map newmap.txt --project cmip5
>> --thredds --publish
>>
>> The publisher will:
>> - not rescan existing files, only the new files
>> - create a new version to reflect the additional files
>>
>>
>> Alternatively you can create a mapfile with *only* the new
>> files (Using esgscan_directory), then republish using the
>> --update command.
>>
>> --Bob
>>
>>
>> On 12/21/11 8:40 AM, "Serguei Nikonov"
>> <serguei.nikonov at noaa.gov>  wrote:
>>
>>> Hi Nate,
>>>
>>> unfortunately this is not the only dataset I have a problem - there
>>> are at least
>>> 5 more. Should I unpublish them locally (db, thredds) and
>> than create
>>> new version containing full set of files? What is the
>> official way to
>>> update dataset?
>>>
>>> Thanks,
>>> Sergey
>>>
>>>
>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>> Hi Bob/Mike,
>>>>
>>>> I believe the problem is that when files were added the
>> timestamp on
>>>> the dataset wasn't updated.
>>>>
>>>> The triple store will only harvest datasets that have files and an
>>>> updated timestamp after the last harvest.
>>>>
>>>> So what likely happened is the dataset was created without
>> files, so
>>>> it wasn't initially harvested. Files were subsequently
>> added, but the
>>>> timestamp wasn't updated, so it was still not a candidate for
>>>> harvesting.
>>>>
>>>> Can you update the date_updated timestamp for the dataset
>> in question
>>>> and then trigger the RDF harvesting, I believe the dataset
>> will show
>>>> up then.
>>>>
>>>> Thanks!
>>>> -Nate
>>>>
>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>> Hi Mike,
>>>>>
>>>>> I am a member of data publishers group. I have been publishing
>>>>> considerable amount of data without such kind of troubles
>> but this
>>>>> one occurred only when I tried to add some files to existing
>>>>> dataset. Publishing from scratch works fine for me.
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>> Hi Serguei,
>>>>>>
>>>>>> That task is on a scheduler and will re-run every 10 minutes. If
>>>>>> your data does not appear after that time then perhaps there is
>>>>>> another issue. One issue could be that publishing to the gateway
>>>>>> requires that you have the role of "Data Publisher";
>>>>>>
>>>>>> "check that the account is member of the proper group
>> and has the
>>>>>> special role of Data Publisher"
>>>>>>
>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>> To: Ganzberger, Michael
>>>>>> Cc: Stéphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with
>> option --update
>>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> thansk for suggestion but I don't have any privileges to do
>>>>>> anything on gateway.
>>>>>> I am just publishing data on GFDL data node.
>>>>>>
>>>>>> Regards,
>>>>>> Sergey
>>>>>>
>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> I'd like to suggest this that may help you from
>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>
>>>>>>> You have to manually trigger the 3store harvesting. Logging as
>>>>>>> root and go to Admin->"Gateway Scheduled Tasks"->"Run
>> tasks" and
>>>>>>> restart the job named RDFSynchronizationJobDetail"
>>>>>>>
>>>>>>> Mike Ganzberger
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: go-essp-tech-bounces at ucar.edu
>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>> On Behalf Of Stéphane Senesi
>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>> To: Serguei Nikonov
>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option
>>>>>>> --update
>>>>>>>
>>>>>>> Serguei
>>>>>>>
>>>>>>> We have for some time now experienced similar problems when
>>>>>>> publishing to the PCMDI gateway, i.e. not getting a "SUCCESS"
>>>>>>> message when publishing . Sometimes, files are actually
>> published
>>>>>>> (or at least accessible through the gateway, their status being
>>>>>>> actually "START_PUBLISHING", after esg_list_datasets report) ,
>>>>>>> sometimes not. An hypothesis is that the PCMDI Gateway load do
>>>>>>> generate the problem. We havn't yet got a confirmation by Bob.
>>>>>>>
>>>>>>> In contrast to your case, this happens when publishing
>> a dataset
>>>>>>> from scratch (I mean, not an update)
>>>>>>>
>>>>>>> Best regards (do not expect any feeback from me since early
>>>>>>> january, yet)
>>>>>>>
>>>>>>> S
>>>>>>>
>>>>>>>
>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>> Hi Bob,
>>>>>>>>
>>>>>>>> I needed to add some missed variables to existing
>> dataset and I
>>>>>>>> found in esgpublish command an option --update. When I
>> tried it
>>>>>>>> I've got normal message like INFO 2011-12-20 11:21:00,893
>>>>>>>> Publishing:
>>>>>>>>
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>>>>> , parent = pcmdi.GFDL INFO 2011-12-20 11:21:07,564 Result:
>>>>>>>> PROCESSING INFO 2011-12-20 11:21:11,209 Result: PROCESSING ....
>>>>>>>>
>>>>>>>> but nothing happened on gateway - new variables are not there.
>>>>>>>> The files corresponding to these variables are in
>> database and in
>>>>>>>> THREDDS catalog but apparently were not published on gateway.
>>>>>>>>
>>>>>>>> I used command line
>>>>>>>> esgpublish --update --keep-version --map<map_file>  --project
>>>>>>>> cmip5 --noscan --publish.
>>>>>>>>
>>>>>>>> Should map file be of some specific format to make it works in
>>>>>>>> mode I need?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sergey Nikonov
>>>>>>>> GFDL
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
> --
> Scanned by iCritical.
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>

-- 
George J. Huffman, Ph.D.  (Voice)  +1 301-614-6308
Sci. Sys. & Appl., Inc.   (FAX)    +1 301-614-5492
NASA/GSFC Code 612 *NEW*  (Email)  george.j.huffman at nasa.gov
Greenbelt, MD 20771 USA   (Office) Bld. 33 Room C417


More information about the GO-ESSP-TECH mailing list