[Go-essp-tech] Visibility of old versions, was... RE: Fwd: Re: Publishing dataset with option --update

Karl Taylor taylor13 at llnl.gov
Tue Jan 10 08:40:32 MST 2012


Hi all,

thanks for the good discussion. Some good arguments have been made for
keeping all versions. I'll not make a policy decision immediately, but
am tending toward strong encouragement to keep all versions. I'll
distribute a draft statement about this for your input and comment
before posting. I'll, of course, also consult directly with other IPCC
DDC folks too.

Best regards,
Karl

On 1/10/12 5:31 AM, Estanislao Gonzalez wrote:
> Hi Jamie,
>
> Indeed, DOIs are not going to solve everything. The DOI is analogous
> to the ISBN of a book, citing the whole book is not always what you
> want in any case. To continue the analogy, users are indeed working
> with pre-prints which get corrected all the time (i.e. the CMIP5
> archive is in flux). People are writing papers citing a pre-print. Of
> course this makes no sense, but they are not doing so willingly, they
> have to as the dead line approaches but the computing groups are not
> ready.
>
> So what do we have now? Some archives with a strong commitment for
> preserving data.
> If the DRS were honored, the URL would be enough for citing any file
> as it has the version in it. Indeed citing +1000 Urls is not
> practical, but a redirection could be added so that the scientist
> cites one URL in which all files URLs are listed (There's no
> implementation for this AFAIK). But at least the URL of DRS committed
> sites could be safely cited, and if the checksum is attached to the
> citation, it is sure that the correct file is always being cited (and
> it could even be found if moved).
>
> I don't know how citations are being done now, nor do I know how they
> were done before when everyone was citing data that it was almost
> impossible to get. DOIs are the very first step in the right
> direction, not the last one.
> IMHO the community should come up with some best practices to overcome
> the problem we are facing: how to cite something that's permanently
> changing. Sharing this will certainly help everyone.
> Before jumping away from this subject I'd also like to add that I
> don't see any proper communication mechanism in the community. All (or
> at least most) questions regarding CMIP5 are AFAICT directed to the
> help-desk, so mostly developers are trying to help the community
> instead of the community trying to help itself. I think we might be
> missing some kind of platform for doing this. We don't have the means
> to support the growing community (and new communities which we are now
> serving), we need them to help with the "helping". Just a thought....
>
> And last, and probably least, the only way to get the latest version
> of any dataset is by re-issuing the search. Especially since multiple
> datasets are referred to in a wget script, finding the latest versions
> of each of them "by hand" will be more time-consuming than issuing the
> search query again.
>
> Thanks,
> Estani
>
> Am 10.01.2012 13:00, schrieb Kettleborough, Jamie:
>> Hello,
>> I'm not sure how to say this: but I'm not sure its just down to DOI's
>> to determine whether a data set should always be visible. I think
>> data needs to be visible where its sufficiently important that a user
>> might want to download it. e.g they want to check or extend someone
>> elses study (and I think there are other reasons). Its not clear to
>> me that all data of this kind will have a DOI - for instance how many
>> of the datasets referenced in papers being written now for the summer
>> deadline of AR5 have (or will have in time) DOIs?
>> I know its tempting to say - any dataset referenced in a paper should
>> have a DOI. But Ithink you need to be realistic about the prospects
>> of this happening on the right timescales.
>> If the DOI is used as the determinent of whether data is always
>> visible then should users be made aware of the risk they are carrying
>> now? For instance, so they know to have local backups of data that is
>> really important to them. (With the possible implication too that
>> they may need to be prepared to 'reshare' this data with others.)
>> For what its worth my personal preference is with the BADC/DKRZ (and
>> I'm sure others) philosophy of keeping all versions - though I
>> realise there are costs in doing this, like getting DRSlib
>> sufficiently bug free and getting it to work in all the contexts it
>> needs to (hard links/soft links), getting it deployed, getting the
>> update mechanism in place for when new bugs are found etc. If you
>> used DRSlib doesn't Estanis use case that caused user grief become
>> easier too - the wget scripts do not need regenerating, you should
>> instead be able to replace the version strings in the url (though I
>> may be assuming things about load balancing etc in saying this).
>> Jamie
>>
>>     ------------------------------------------------------------------------
>>     *From:* go-essp-tech-bounces at ucar.edu
>>     [mailto:go-essp-tech-bounces at ucar.edu] *On Behalf Of *Estanislao
>>     Gonzalez
>>     *Sent:* 10 January 2012 10:21
>>     *To:* Karl Taylor
>>     *Cc:* Drach, Bob; go-essp-tech at ucar.edu; serguei.nikonov at noaa.gov
>>     *Subject:* Re: [Go-essp-tech] Fwd: Re: Publishing dataset with
>>     option --update
>>
>>     Well to be honest I do agree this is a decision each institution
>>     has to make, but for us I'd prefer offering everything we have
>>     and let the systems decide what to do with this information. I.e.
>>     I've used it to generate some comments (I might have already show
>>     you this), just go here:
>>     http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.NCC.NorESM1-M.sstClim.mon.land.Lmon.r1i1p1.html
>>     and click on history.
>>     That information could be generated only because we store the
>>     metadata to the previous version.
>>
>>     By the way, The only way of inhibiting the user from getting an
>>     older version, if that's what it's wanted, is by either removing
>>     the files from the TDS served directory, or changing the access
>>     restriction at the Gateway. Because of a well-known TDS bug (or
>>     feature) files present at that directory and not found in any
>>     catalog are served without any restriction (AFAIK no certificate
>>     is required for this). So, normally the wget script would work
>>     even if the files where unpublished.
>>
>>     It really depends on the use-case... but e.g. I had to explain
>>     all this to a couple of people in the help-desk since the wget
>>     script they've downloaded wasn't working anymore (files were
>>     removed). They weren't thrilled to know they had to re issue the
>>     search again (there's no workaround for this) and they wanted to
>>     know what was changed in the new version, and there's where we
>>     can't help our users any more since we don't have that information...
>>
>>     I don't know what our users prefer, but I think they have more
>>     important problems to cope with at this time... if they could
>>     reliably get one version they could start worrying about others.
>>     From my perspective as a data manager, it's worth the tiny
>>     additional effort, if there's any.
>>
>>     Cheers,
>>     Estani
>>
>>     Am 09.01.2012 20:05, schrieb Karl Taylor:
>>>     Hi Estani,
>>>
>>>     I agree that a new version number should (I'd say must) be
>>>     assigned when any changes are made. However, except for DOI
>>>     datasets, most groups will not want older versions to be visible
>>>     or downloadable.
>>>
>>>     Do you agree?
>>>
>>>     cheers,
>>>     Karl
>>>
>>>     On 1/9/12 10:37 AM, Estanislao Gonzalez wrote:
>>>>     Hi Karl,
>>>>
>>>>     It is indeed a good point, but I must add that we are not
>>>>     talking about preserving a version (although we do it here at
>>>>     DKRZ) but of signaling that a version has been changed. So the
>>>>     version is a key to find a specific dataset which changes in time.
>>>>
>>>>     Even before a DOI assignment I'd encourage all to create a new
>>>>     version every time the dataset changes in any way. Institutions
>>>>     have the right to preserve whatever version they want (they may
>>>>     even delete DOI-assigned versions, on the other hand archives
>>>>     can't, that's why archives are for).
>>>>     But altering the dataset preserving the version just bring
>>>>     chaos for the users and for us at the help-desk as we have to
>>>>     explain why something has changed (or rather answer that we
>>>>     don't know why...). It means that the same key now points to a
>>>>     different dataset.
>>>>
>>>>     The only benefits I can see for preserving the same version is
>>>>     that publishing using the same version seems to be easier to
>>>>     some (for our workflow it's not, it's exactly the same) and
>>>>     that if only new files are added this seems to work fine for
>>>>     publication at both the data-node and the gateway as it's
>>>>     properly supported.
>>>>     If anything else changes, this does not work as expected (wrong
>>>>     checksums, ghost files at the gateway, etc). And changing a
>>>>     version contents makes no sense to the user IMHO (e.g. it's as
>>>>     if you might sometimes get more files from a tarred file... how
>>>>     often should you extract it to be sure you got "all of them")
>>>>
>>>>     If old versions were preserved (which take almost no resources
>>>>     if using hardlinks), a simple comparison would tell that the
>>>>     only changes were the addition of some specific files.
>>>>
>>>>     Basically, reusing the version ends in a non-recoverable loss
>>>>     of information. That's why I discourage it.
>>>>
>>>>     My 2c,
>>>>     Estani
>>>>
>>>>     Am 09.01.2012 17:25, schrieb Karl Taylor:
>>>>>     Dear all,
>>>>>
>>>>>     I do not have time to read this thoroughly, so perhaps what
>>>>>     I'll mention here is irrelevant. There may be some
>>>>>     miscommunication about what is meant by "version". There are
>>>>>     two cases to consider:
>>>>>
>>>>>     1. Before a dataset has become official (i.e., assigned a
>>>>>     DOI), a group may choose to remove all record of it from the
>>>>>     database and publish a replacement version.
>>>>>
>>>>>     2. Alternatively, if a group wants to preserve a previous
>>>>>     version (as is required after a DOI has been assigned), then
>>>>>     the new version will not "replace" the previous version, but
>>>>>     simply be added to the archive.
>>>>>
>>>>>     It is possible that different publication procedures will
>>>>>     apply in these different cases.
>>>>>
>>>>>     best,
>>>>>     Karl
>>>>>
>>>>>     On 1/9/12 4:26 AM, Estanislao Gonzalez wrote:
>>>>>>     Just to mentioned that we do the same thing. We use directly
>>>>>>     --new-version and a map file containing all files for the new version,
>>>>>>     but we do create hard-links to the files being reused, so they are
>>>>>>     indeed all "new" as their paths always differ from those of previous
>>>>>>     versions. (In any case for the publisher they are the same and thus
>>>>>>     encode them with the nc_0 name if I recall correctly)
>>>>>>
>>>>>>     Thanks,
>>>>>>     Estani
>>>>>>     Am 09.01.2012 12:15, schrieb stephen.pascoe at stfc.ac.uk:
>>>>>>>     Hi Bob,
>>>>>>>
>>>>>>>     This "unpublish first" requirement is news to me.  We've been publishing new versions without doing this for some time.  Now, we have come across difficulties with a few datasets but it's generally worked.
>>>>>>>
>>>>>>>     We don't use the --update option though.  Each time we publish a new version we provide a mapfile of all files in the dataset(s).  I'd recommend Sergey try doing this before removing a previous version.
>>>>>>>
>>>>>>>     If you unpublish from the Gateway first you'll loose the information in the "History" tab.  For instance http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output2.MOHC.HadGEM2-ES.rcp85.mon.aerosol.aero.r1i1p1.html shows 2 versions.
>>>>>>>
>>>>>>>     Stephen.
>>>>>>>
>>>>>>>     ---
>>>>>>>     Stephen Pascoe  +44 (0)1235 445980
>>>>>>>     Centre of Environmental Data Archival
>>>>>>>     STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>>>>>
>>>>>>>
>>>>>>>     -----Original Message-----
>>>>>>>     From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
>>>>>>>     Sent: 06 January 2012 20:53
>>>>>>>     To: Serguei Nikonov; Eric Nienhouse
>>>>>>>     Cc: go-essp-tech at ucar.edu
>>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re: Publishing dataset with option --update
>>>>>>>
>>>>>>>     Hi Sergey,
>>>>>>>
>>>>>>>     When updating a dataset, it's also important to unpublish it before publishing the new version. E.g, first run
>>>>>>>
>>>>>>>     esgunpublish<dataset_id>
>>>>>>>
>>>>>>>     The reason is that, when you publish to the gateway, the gateway software tries to *add* the new information to the existing dataset entry, rather that replace it.
>>>>>>>
>>>>>>>     --Bob
>>>>>>>     ________________________________________
>>>>>>>     From: Serguei Nikonov [serguei.nikonov at noaa.gov]
>>>>>>>     Sent: Friday, January 06, 2012 10:45 AM
>>>>>>>     To: Eric Nienhouse
>>>>>>>     Cc: Bob Drach; go-essp-tech at ucar.edu
>>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset with option --update
>>>>>>>
>>>>>>>     Hi Eric,
>>>>>>>
>>>>>>>     thanks for you help. I have no any objections about any adopted versioning
>>>>>>>     policy. What I need is to know how to apply it. The ways I used did not work for
>>>>>>>     me. Hopefully, the reasons is bad things in thredds and database you pointed
>>>>>>>     put. I am cleaning them right now, then will see...
>>>>>>>
>>>>>>>     Just for clarification, if I need to update dataset (with changing version) I
>>>>>>>     create map file containing full set of files (old and new ones) and then use
>>>>>>>     this map file in esgpublish script with option --update, is it correct? Will it
>>>>>>>     be enough for creating dataset of new version? BTW, there is nothing about
>>>>>>>     version for option 'update' in esgpublish help.
>>>>>>>
>>>>>>>     Thanks,
>>>>>>>     Sergey
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
>>>>>>>>     Hi Serguei,
>>>>>>>>
>>>>>>>>     Following are a few more suggestions to diagnose this publishing issue. I agree
>>>>>>>>     with others on this thread that adding new files (or changing existing ones)
>>>>>>>>     should always trigger a new dataset version.
>>>>>>>>
>>>>>>>>     It does not appear you are receiving a final "SUCCESS" or failure message when
>>>>>>>>     publishing to the Gateway (with esgpublish --publish). Please try increasing
>>>>>>>>     your "polling" levels in your $ESGINI file. Eg:
>>>>>>>>
>>>>>>>>     hessian_service_polling_delay = 10
>>>>>>>>     hessian_service_polling_iterations = 500
>>>>>>>>
>>>>>>>>     You should see a final "SUCCESS" or "ERROR" with Java trace output at the
>>>>>>>>     termination of the command.
>>>>>>>>
>>>>>>>>     I've reviewed the Thredds catalog for the dataset you note below:
>>>>>>>>
>>>>>>>>
>>>>>>>>     http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>>>>>>>>
>>>>>>>>
>>>>>>>>     There appear to be multiple instances of certain files within the catalog which
>>>>>>>>     is a problem. The Gateway publish will fail if a particular file (URL) is
>>>>>>>>     referenced multiple times with differing metadata. An example is:
>>>>>>>>
>>>>>>>>
>>>>>>>>     */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>>>>>>>>
>>>>>>>>
>>>>>>>>     This file appears as two separate file versions in the Thredds catalog (one with
>>>>>>>>     id ending in ".nc" and another with ".nc_0"). There should be only one reference
>>>>>>>>     to this file URL in the catalog.
>>>>>>>>
>>>>>>>>     The previous version of the dataset in the publisher/node database may be
>>>>>>>>     leading to this issue. You may need to add "--database-delete" to your
>>>>>>>>     esgunpublish command to clean things up. Bob can advise on this. Note that the
>>>>>>>>     original esgpublish command shown in this email thread included "--keep-version".
>>>>>>>>
>>>>>>>>     After publishing to the Gateway successfully, you can check the dataset details
>>>>>>>>     by URL with the published dataset identifier. For example:
>>>>>>>>
>>>>>>>>
>>>>>>>>     http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.html
>>>>>>>>
>>>>>>>>
>>>>>>>>     I hope this helps.
>>>>>>>>
>>>>>>>>     Regards,
>>>>>>>>
>>>>>>>>     -Eric
>>>>>>>>
>>>>>>>>     Serguei Nikonov wrote:
>>>>>>>>>     Hi Bob,
>>>>>>>>>
>>>>>>>>>     I still can not do anything about updating datasets. The commands you
>>>>>>>>>     suggested executed successfully but datasets did not appear on gateway. I
>>>>>>>>>     tried it several times for different datasets but result is the same.
>>>>>>>>>
>>>>>>>>>     Do you have any idea what to undertake in such situation.
>>>>>>>>>
>>>>>>>>>     Here it is some details about what I tried.
>>>>>>>>>     I needed to add file to dataset
>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
>>>>>>>>>     As you advised I unpublished it (esgunpublish
>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then
>>>>>>>>>     created full mapfile (with additional file) and then publised it:
>>>>>>>>>     esgpublish --read-files --map new_mapfile --project cmip5 --thredd --publish
>>>>>>>>>
>>>>>>>>>     As I told there were no any errors. Dataset is in database and in thredds but
>>>>>>>>>     not in gateway.
>>>>>>>>>
>>>>>>>>>     The second way I tried is using mapfile containing only files to update. I
>>>>>>>>>     needed to substitute several existing files in dataset for new ones. I created
>>>>>>>>>     mapfile with only files needed to substitute:
>>>>>>>>>     esgscan_directory --read-files --project cmip5 -o mapfile.txt
>>>>>>>>>     /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v20111206
>>>>>>>>>
>>>>>>>>>     and then published it with update option:
>>>>>>>>>     esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>>>>>>>>>
>>>>>>>>>     The result is the same as in a previous case - all things are fine locally but
>>>>>>>>>     nothing happened on gateway.
>>>>>>>>>
>>>>>>>>>     Thanks,
>>>>>>>>>     Sergey
>>>>>>>>>
>>>>>>>>>     -------- Original Message --------
>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>     Date: Thu, 29 Dec 2011 11:02:05 -0500
>>>>>>>>>     From: Serguei Nikonov<Serguei.Nikonov at noaa.gov>
>>>>>>>>>     Organization: GFDL
>>>>>>>>>     To: Drach, Bob<drach1 at llnl.gov>
>>>>>>>>>     CC: Nathan Wilhelmi<wilhelmi at ucar.edu>, "Ganzberger, Michael"
>>>>>>>>>     <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu"<go-essp-tech at ucar.edu>
>>>>>>>>>
>>>>>>>>>     Hi Bob,
>>>>>>>>>
>>>>>>>>>     I tried the 1st way you suggested and it worked partially - the dataset was
>>>>>>>>>     created om datanode with version 2 but it was not popped up on gateway. To make
>>>>>>>>>     sure that it's not occasional result I repeated it with another datasets with
>>>>>>>>>     the same result.
>>>>>>>>>     Now I have 2 datasets on datanode (visible in thredds server) but they are
>>>>>>>>>     absent on gateway:
>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>>>>>>>>>
>>>>>>>>>     Does it make sense to repeat esgpublish with 'publish' option?
>>>>>>>>>
>>>>>>>>>     Thanks and Happy New Year,
>>>>>>>>>     Sergey
>>>>>>>>>
>>>>>>>>>     On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>>>>>>>>>     Hi Sergey,
>>>>>>>>>>
>>>>>>>>>>     The way I would recommend adding new files to an existing dataset is as
>>>>>>>>>>     follows:
>>>>>>>>>>
>>>>>>>>>>     - Unpublish the previous dataset from the gateway and thredds
>>>>>>>>>>
>>>>>>>>>>     % esgunpublish
>>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>>>>>>>
>>>>>>>>>>     - Add the new files to the existing mapfile for the dataset they are being
>>>>>>>>>>     added to.
>>>>>>>>>>
>>>>>>>>>>     - Republish with the expanded mapfile:
>>>>>>>>>>
>>>>>>>>>>     % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>>>>>>>>>>     --publish
>>>>>>>>>>
>>>>>>>>>>     The publisher will:
>>>>>>>>>>     - not rescan existing files, only the new files
>>>>>>>>>>     - create a new version to reflect the additional files
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Alternatively you can create a mapfile with *only* the new files (Using
>>>>>>>>>>     esgscan_directory), then republish using the --update command.
>>>>>>>>>>
>>>>>>>>>>     --Bob
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov>  wrote:
>>>>>>>>>>
>>>>>>>>>>>     Hi Nate,
>>>>>>>>>>>
>>>>>>>>>>>     unfortunately this is not the only dataset I have a problem - there are at
>>>>>>>>>>>     least
>>>>>>>>>>>     5 more. Should I unpublish them locally (db, thredds) and than create new
>>>>>>>>>>>     version containing full set of files? What is the official way to update
>>>>>>>>>>>     dataset?
>>>>>>>>>>>
>>>>>>>>>>>     Thanks,
>>>>>>>>>>>     Sergey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>>>>>>>>>     Hi Bob/Mike,
>>>>>>>>>>>>
>>>>>>>>>>>>     I believe the problem is that when files were added the timestamp on the
>>>>>>>>>>>>     dataset
>>>>>>>>>>>>     wasn't updated.
>>>>>>>>>>>>
>>>>>>>>>>>>     The triple store will only harvest datasets that have files and an updated
>>>>>>>>>>>>     timestamp after the last harvest.
>>>>>>>>>>>>
>>>>>>>>>>>>     So what likely happened is the dataset was created without files, so it
>>>>>>>>>>>>     wasn't
>>>>>>>>>>>>     initially harvested. Files were subsequently added, but the timestamp wasn't
>>>>>>>>>>>>     updated, so it was still not a candidate for harvesting.
>>>>>>>>>>>>
>>>>>>>>>>>>     Can you update the date_updated timestamp for the dataset in question and
>>>>>>>>>>>>     then
>>>>>>>>>>>>     trigger the RDF harvesting, I believe the dataset will show up then.
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks!
>>>>>>>>>>>>     -Nate
>>>>>>>>>>>>
>>>>>>>>>>>>     On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     I am a member of data publishers group. I have been publishing considerable
>>>>>>>>>>>>>     amount of data without such kind of troubles but this one occurred only when
>>>>>>>>>>>>>     I
>>>>>>>>>>>>>     tried to add some files to existing dataset. Publishing from scratch works
>>>>>>>>>>>>>     fine
>>>>>>>>>>>>>     for me.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>
>>>>>>>>>>>>>     On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     That task is on a scheduler and will re-run every 10 minutes. If your data
>>>>>>>>>>>>>>     does not appear after that time then perhaps there is another issue. One
>>>>>>>>>>>>>>     issue could be that publishing to the gateway requires that you have the
>>>>>>>>>>>>>>     role
>>>>>>>>>>>>>>     of "Data Publisher";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "check that the account is member of the proper group and has the special
>>>>>>>>>>>>>>     role of Data Publisher"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>>     From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>>>>>>>>>     To: Ganzberger, Michael
>>>>>>>>>>>>>>     Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     thansk for suggestion but I don't have any privileges to do anything on
>>>>>>>>>>>>>>     gateway.
>>>>>>>>>>>>>>     I am just publishing data on GFDL data node.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     I'd like to suggest this that may help you from
>>>>>>>>>>>>>>>     http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "The search does not reflect the latest DB changes I've made
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     You have to manually trigger the 3store harvesting. Logging as root and go
>>>>>>>>>>>>>>>     to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job named
>>>>>>>>>>>>>>>     RDFSynchronizationJobDetail"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Mike Ganzberger
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>>>     From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>>>>>>>>>     On Behalf Of StИphane Senesi
>>>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>>>>>>>>>     To: Serguei Nikonov
>>>>>>>>>>>>>>>     Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Serguei
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     We have for some time now experienced similar problems when publishing
>>>>>>>>>>>>>>>     to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>>>>>>>>>>     publishing . Sometimes, files are actually published (or at least
>>>>>>>>>>>>>>>     accessible through the gateway, their status being actually
>>>>>>>>>>>>>>>     "START_PUBLISHING", after esg_list_datasets report) , sometimes not. An
>>>>>>>>>>>>>>>     hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>>>>>>>>>>     havn't yet got a confirmation by Bob.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     In contrast to your case, this happens when publishing a dataset from
>>>>>>>>>>>>>>>     scratch (I mean, not an update)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Best regards (do not expect any feeback from me since early january, yet)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     S
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>>>>>>>>>     Hi Bob,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     I needed to add some missed variables to existing dataset and I found in
>>>>>>>>>>>>>>>>     esgpublish command an option --update. When I tried it I've got normal
>>>>>>>>>>>>>>>>     message like
>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1, parent
>>>>>>>>>>>>>>>>     =
>>>>>>>>>>>>>>>>     pcmdi.GFDL
>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>>>>>>>>>     ....
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     but nothing happened on gateway - new variables are not there. The files
>>>>>>>>>>>>>>>>     corresponding to these variables are in database and in THREDDS catalog
>>>>>>>>>>>>>>>>     but
>>>>>>>>>>>>>>>>     apparently were not published on gateway.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     I used command line
>>>>>>>>>>>>>>>>     esgpublish --update --keep-version --map<map_file>  --project cmip5
>>>>>>>>>>>>>>>>     --noscan
>>>>>>>>>>>>>>>>     --publish.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     Should map file be of some specific format to make it works in mode I
>>>>>>>>>>>>>>>>     need?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>>>     Sergey Nikonov
>>>>>>>>>>>>>>>>     GFDL
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>     _______________________________________________
>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>     _______________________________________________
>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>     --
>>>>>>     Estanislao Gonzalez
>>>>>>
>>>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>>>
>>>>>>     Phone:   +49 (40) 46 00 94-126
>>>>>>     E-Mail:  gonzalez at dkrz.de
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     GO-ESSP-TECH mailing list
>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>
>>>>
>>>>     -- 
>>>>     Estanislao Gonzalez
>>>>
>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>
>>>>     Phone:   +49 (40) 46 00 94-126
>>>>     E-Mail:  gonzalez at dkrz.de 
>>
>>
>>     -- 
>>     Estanislao Gonzalez
>>
>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>
>>     Phone:   +49 (40) 46 00 94-126
>>     E-Mail:  gonzalez at dkrz.de 
>>
>
>
> -- 
> Estanislao Gonzalez
>
> Max-Planck-Institut für Meteorologie (MPI-M)
> Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
> Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
> Phone:   +49 (40) 46 00 94-126
> E-Mail:  gonzalez at dkrz.de 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120110/0b8c30bb/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list