[Go-essp-tech] Publishing dataset with option --update
Drach, Bob
drach1 at llnl.gov
Tue Jan 3 12:40:50 MST 2012
Hi Jamie, Stephen,
I agree with Stephen. The understanding has been that adding new files to a
dataset also triggers a new dataset version. It is possible to override this
behavior, but the default is to generate a new version number when files
have been added, modified, or deleted from an existing dataset.
Regards,
--Bob
On 12/30/11 11:12 AM, "stephen.pascoe at stfc.ac.uk"
<stephen.pascoe at stfc.ac.uk> wrote:
> Hi Jamie,
>
> My understanding was that we had agreed that once a dataset version had been
> published (i.e. is available at a Gateway) no files would be
> added/deleted/changed in that version, any changes to the dataset would
> trigger a new version. This is the only sensible way of having versions at
> all and breaking this rule means users can't be confident that their version
> is consistent with someone else's at the same version number.
>
> I'm sure you already knew this was BADC's position and want to clarify what
> other centres understand by the versioning rules. No-one has ever
> contradicted my understanding in emails or telcos but in the end it is up to
> individual datanode administrators to keep to these rules.
>
> Cheers,
> Stephen.
>
> ---
> Stephen Pascoe +44 (0)1235 445980
> Centre of Environmental Data Archival
> STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>
> -----Original Message-----
> From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On
> Behalf Of Kettleborough, Jamie
> Sent: 22 December 2011 09:24
> To: Drach, Bob; Serguei Nikonov; Nathan Wilhelmi
> Cc: go-essp-tech at ucar.edu
> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>
> Hello Karl, Bob,
>
> Sorry to labour this, but can you clarify (I don't know enough about map files
> and esgpublish to know the answer). Do you expect addition of new files to a
> currently published data set to trigger a new DRS publication version (so the
> vYYYYMMDD bit changes in the DRS)?
>
> If not can you clarify under what circumstances you expect data publishers to
> generater new DRS publication versions, and when its OK for them to update a
> current version. [I think you - users, data providers, data node admins etc -
> get a better view of the history of the dataset if you always update the
> version - which I think ties in with what Ashish said. Even if this isn't ESG
> policy I think it is a good policy for CMIP5.]
>
> I *suspect* there may be a communication issue here and not everyone has the
> same understanding of what should happen.
>
> Thanks,
>
> Jamie
>
>
>
>> -----Original Message-----
>> From: go-essp-tech-bounces at ucar.edu
>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
>> Sent: 22 December 2011 01:42
>> To: Serguei Nikonov; Nathan Wilhelmi
>> Cc: go-essp-tech at ucar.edu
>> Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>
>> Hi Sergey,
>>
>> The way I would recommend adding new files to an existing
>> dataset is as
>> follows:
>>
>> - Unpublish the previous dataset from the gateway and thredds
>>
>> % esgunpublish
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>
>> - Add the new files to the existing mapfile for the dataset
>> they are being added to.
>>
>> - Republish with the expanded mapfile:
>>
>> % esgpublish --read-files --map newmap.txt --project cmip5
>> --thredds --publish
>>
>> The publisher will:
>> - not rescan existing files, only the new files
>> - create a new version to reflect the additional files
>>
>>
>> Alternatively you can create a mapfile with *only* the new
>> files (Using esgscan_directory), then republish using the
>> --update command.
>>
>> --Bob
>>
>>
>> On 12/21/11 8:40 AM, "Serguei Nikonov"
>> <serguei.nikonov at noaa.gov> wrote:
>>
>>> Hi Nate,
>>>
>>> unfortunately this is not the only dataset I have a problem - there
>>> are at least
>>> 5 more. Should I unpublish them locally (db, thredds) and
>> than create
>>> new version containing full set of files? What is the
>> official way to
>>> update dataset?
>>>
>>> Thanks,
>>> Sergey
>>>
>>>
>>> On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>> Hi Bob/Mike,
>>>>
>>>> I believe the problem is that when files were added the
>> timestamp on
>>>> the dataset wasn't updated.
>>>>
>>>> The triple store will only harvest datasets that have files and an
>>>> updated timestamp after the last harvest.
>>>>
>>>> So what likely happened is the dataset was created without
>> files, so
>>>> it wasn't initially harvested. Files were subsequently
>> added, but the
>>>> timestamp wasn't updated, so it was still not a candidate for
>>>> harvesting.
>>>>
>>>> Can you update the date_updated timestamp for the dataset
>> in question
>>>> and then trigger the RDF harvesting, I believe the dataset
>> will show
>>>> up then.
>>>>
>>>> Thanks!
>>>> -Nate
>>>>
>>>> On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>> Hi Mike,
>>>>>
>>>>> I am a member of data publishers group. I have been publishing
>>>>> considerable amount of data without such kind of troubles
>> but this
>>>>> one occurred only when I tried to add some files to existing
>>>>> dataset. Publishing from scratch works fine for me.
>>>>>
>>>>> Thanks,
>>>>> Sergey
>>>>>
>>>>> On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>> Hi Serguei,
>>>>>>
>>>>>> That task is on a scheduler and will re-run every 10 minutes. If
>>>>>> your data does not appear after that time then perhaps there is
>>>>>> another issue. One issue could be that publishing to the gateway
>>>>>> requires that you have the role of "Data Publisher";
>>>>>>
>>>>>> "check that the account is member of the proper group
>> and has the
>>>>>> special role of Data Publisher"
>>>>>>
>>>>>> http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>> Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>> To: Ganzberger, Michael
>>>>>> Cc: Stéphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with
>> option --update
>>>>>>
>>>>>> Hi Mike,
>>>>>>
>>>>>> thansk for suggestion but I don't have any privileges to do
>>>>>> anything on gateway.
>>>>>> I am just publishing data on GFDL data node.
>>>>>>
>>>>>> Regards,
>>>>>> Sergey
>>>>>>
>>>>>> On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> I'd like to suggest this that may help you from
>>>>>>> http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "The search does not reflect the latest DB changes I've made
>>>>>>>
>>>>>>> You have to manually trigger the 3store harvesting. Logging as
>>>>>>> root and go to Admin->"Gateway Scheduled Tasks"->"Run
>> tasks" and
>>>>>>> restart the job named RDFSynchronizationJobDetail"
>>>>>>>
>>>>>>> Mike Ganzberger
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: go-essp-tech-bounces at ucar.edu
>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>> On Behalf Of Stéphane Senesi
>>>>>>> Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>> To: Serguei Nikonov
>>>>>>> Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>> Subject: Re: [Go-essp-tech] Publishing dataset with option
>>>>>>> --update
>>>>>>>
>>>>>>> Serguei
>>>>>>>
>>>>>>> We have for some time now experienced similar problems when
>>>>>>> publishing to the PCMDI gateway, i.e. not getting a "SUCCESS"
>>>>>>> message when publishing . Sometimes, files are actually
>> published
>>>>>>> (or at least accessible through the gateway, their status being
>>>>>>> actually "START_PUBLISHING", after esg_list_datasets report) ,
>>>>>>> sometimes not. An hypothesis is that the PCMDI Gateway load do
>>>>>>> generate the problem. We havn't yet got a confirmation by Bob.
>>>>>>>
>>>>>>> In contrast to your case, this happens when publishing
>> a dataset
>>>>>>> from scratch (I mean, not an update)
>>>>>>>
>>>>>>> Best regards (do not expect any feeback from me since early
>>>>>>> january, yet)
>>>>>>>
>>>>>>> S
>>>>>>>
>>>>>>>
>>>>>>> Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>> Hi Bob,
>>>>>>>>
>>>>>>>> I needed to add some missed variables to existing
>> dataset and I
>>>>>>>> found in esgpublish command an option --update. When I
>> tried it
>>>>>>>> I've got normal message like INFO 2011-12-20 11:21:00,893
>>>>>>>> Publishing:
>>>>>>>>
>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>>>>> , parent = pcmdi.GFDL INFO 2011-12-20 11:21:07,564 Result:
>>>>>>>> PROCESSING INFO 2011-12-20 11:21:11,209 Result: PROCESSING ....
>>>>>>>>
>>>>>>>> but nothing happened on gateway - new variables are not there.
>>>>>>>> The files corresponding to these variables are in
>> database and in
>>>>>>>> THREDDS catalog but apparently were not published on gateway.
>>>>>>>>
>>>>>>>> I used command line
>>>>>>>> esgpublish --update --keep-version --map<map_file> --project
>>>>>>>> cmip5 --noscan --publish.
>>>>>>>>
>>>>>>>> Should map file be of some specific format to make it works in
>>>>>>>> mode I need?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sergey Nikonov
>>>>>>>> GFDL
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> GO-ESSP-TECH mailing list
>>>>>>>> GO-ESSP-TECH at ucar.edu
>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> GO-ESSP-TECH mailing list
>>>>> GO-ESSP-TECH at ucar.edu
>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> GO-ESSP-TECH mailing list
>> GO-ESSP-TECH at ucar.edu
>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>
> _______________________________________________
> GO-ESSP-TECH mailing list
> GO-ESSP-TECH at ucar.edu
> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
More information about the GO-ESSP-TECH
mailing list