[Go-essp-tech] Visibility of old versions, was... RE: Fwd: Re: Publishing dataset with option --update

Estanislao Gonzalez gonzalez at dkrz.de
Tue Jan 10 06:31:35 MST 2012


Hi Jamie,

Indeed, DOIs are not going to solve everything. The DOI is analogous to
the ISBN of a book, citing the whole book is not always what you want in
any case. To continue the analogy, users are indeed working with
pre-prints which get corrected all the time (i.e. the CMIP5 archive is
in flux). People are writing papers citing a pre-print. Of course this
makes no sense, but they are not doing so willingly, they have to as the
dead line approaches but the computing groups are not ready.

So what do we have now? Some archives with a strong commitment for
preserving data.
If the DRS were honored, the URL would be enough for citing any file as
it has the version in it. Indeed citing +1000 Urls is not practical, but
a redirection could be added so that the scientist cites one URL in
which all files URLs are listed (There's no implementation for this
AFAIK). But at least the URL of DRS committed sites could be safely
cited, and if the checksum is attached to the citation, it is sure that
the correct file is always being cited (and it could even be found if
moved).

I don't know how citations are being done now, nor do I know how they
were done before when everyone was citing data that it was almost
impossible to get. DOIs are the very first step in the right direction,
not the last one.
IMHO the community should come up with some best practices to overcome
the problem we are facing: how to cite something that's permanently
changing. Sharing this will certainly help everyone.
Before jumping away from this subject I'd also like to add that I don't
see any proper communication mechanism in the community. All (or at
least most) questions regarding CMIP5 are AFAICT directed to the
help-desk, so mostly developers are trying to help the community instead
of the community trying to help itself. I think we might be missing some
kind of platform for doing this. We don't have the means to support the
growing community (and new communities which we are now serving), we
need them to help with the "helping". Just a thought....

And last, and probably least, the only way to get the latest version of
any dataset is by re-issuing the search. Especially since multiple
datasets are referred to in a wget script, finding the latest versions
of each of them "by hand" will be more time-consuming than issuing the
search query again.

Thanks,
Estani

Am 10.01.2012 13:00, schrieb Kettleborough, Jamie:
> Hello,
> I'm not sure how to say this: but I'm not sure its just down to DOI's
> to determine whether a data set should always be visible. I think data
> needs to be visible where its sufficiently important that a user might
> want to download it. e.g they want to check or extend someone elses
> study (and I think there are other reasons). Its not clear to me that
> all data of this kind will have a DOI - for instance how many of the
> datasets referenced in papers being written now for the summer
> deadline of AR5 have (or will have in time) DOIs?
> I know its tempting to say - any dataset referenced in a paper should
> have a DOI. But Ithink you need to be realistic about the prospects of
> this happening on the right timescales.
> If the DOI is used as the determinent of whether data is always
> visible then should users be made aware of the risk they are carrying
> now? For instance, so they know to have local backups of data that is
> really important to them. (With the possible implication too that they
> may need to be prepared to 'reshare' this data with others.)
> For what its worth my personal preference is with the BADC/DKRZ (and
> I'm sure others) philosophy of keeping all versions - though I realise
> there are costs in doing this, like getting DRSlib sufficiently bug
> free and getting it to work in all the contexts it needs to (hard
> links/soft links), getting it deployed, getting the update mechanism
> in place for when new bugs are found etc. If you used DRSlib doesn't
> Estanis use case that caused user grief become easier too - the wget
> scripts do not need regenerating, you should instead be able to
> replace the version strings in the url (though I may be assuming
> things about load balancing etc in saying this).
> Jamie
>
>     ------------------------------------------------------------------------
>     *From:* go-essp-tech-bounces at ucar.edu
>     [mailto:go-essp-tech-bounces at ucar.edu] *On Behalf Of *Estanislao
>     Gonzalez
>     *Sent:* 10 January 2012 10:21
>     *To:* Karl Taylor
>     *Cc:* Drach, Bob; go-essp-tech at ucar.edu; serguei.nikonov at noaa.gov
>     *Subject:* Re: [Go-essp-tech] Fwd: Re: Publishing dataset with
>     option --update
>
>     Well to be honest I do agree this is a decision each institution
>     has to make, but for us I'd prefer offering everything we have and
>     let the systems decide what to do with this information. I.e. I've
>     used it to generate some comments (I might have already show you
>     this), just go here:
>     http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.NCC.NorESM1-M.sstClim.mon.land.Lmon.r1i1p1.html
>     and click on history.
>     That information could be generated only because we store the
>     metadata to the previous version.
>
>     By the way, The only way of inhibiting the user from getting an
>     older version, if that's what it's wanted, is by either removing
>     the files from the TDS served directory, or changing the access
>     restriction at the Gateway. Because of a well-known TDS bug (or
>     feature) files present at that directory and not found in any
>     catalog are served without any restriction (AFAIK no certificate
>     is required for this). So, normally the wget script would work
>     even if the files where unpublished.
>
>     It really depends on the use-case... but e.g. I had to explain all
>     this to a couple of people in the help-desk since the wget script
>     they've downloaded wasn't working anymore (files were removed).
>     They weren't thrilled to know they had to re issue the search
>     again (there's no workaround for this) and they wanted to know
>     what was changed in the new version, and there's where we can't
>     help our users any more since we don't have that information...
>
>     I don't know what our users prefer, but I think they have more
>     important problems to cope with at this time... if they could
>     reliably get one version they could start worrying about others.
>     From my perspective as a data manager, it's worth the tiny
>     additional effort, if there's any.
>
>     Cheers,
>     Estani
>
>     Am 09.01.2012 20:05, schrieb Karl Taylor:
>>     Hi Estani,
>>
>>     I agree that a new version number should (I'd say must) be
>>     assigned when any changes are made. However, except for DOI
>>     datasets, most groups will not want older versions to be visible
>>     or downloadable.
>>
>>     Do you agree?
>>
>>     cheers,
>>     Karl
>>
>>     On 1/9/12 10:37 AM, Estanislao Gonzalez wrote:
>>>     Hi Karl,
>>>
>>>     It is indeed a good point, but I must add that we are not
>>>     talking about preserving a version (although we do it here at
>>>     DKRZ) but of signaling that a version has been changed. So the
>>>     version is a key to find a specific dataset which changes in time.
>>>
>>>     Even before a DOI assignment I'd encourage all to create a new
>>>     version every time the dataset changes in any way. Institutions
>>>     have the right to preserve whatever version they want (they may
>>>     even delete DOI-assigned versions, on the other hand archives
>>>     can't, that's why archives are for).
>>>     But altering the dataset preserving the version just bring chaos
>>>     for the users and for us at the help-desk as we have to explain
>>>     why something has changed (or rather answer that we don't know
>>>     why...). It means that the same key now points to a different
>>>     dataset.
>>>
>>>     The only benefits I can see for preserving the same version is
>>>     that publishing using the same version seems to be easier to
>>>     some (for our workflow it's not, it's exactly the same) and that
>>>     if only new files are added this seems to work fine for
>>>     publication at both the data-node and the gateway as it's
>>>     properly supported.
>>>     If anything else changes, this does not work as expected (wrong
>>>     checksums, ghost files at the gateway, etc). And changing a
>>>     version contents makes no sense to the user IMHO (e.g. it's as
>>>     if you might sometimes get more files from a tarred file... how
>>>     often should you extract it to be sure you got "all of them")
>>>
>>>     If old versions were preserved (which take almost no resources
>>>     if using hardlinks), a simple comparison would tell that the
>>>     only changes were the addition of some specific files.
>>>
>>>     Basically, reusing the version ends in a non-recoverable loss of
>>>     information. That's why I discourage it.
>>>
>>>     My 2c,
>>>     Estani
>>>
>>>     Am 09.01.2012 17:25, schrieb Karl Taylor:
>>>>     Dear all,
>>>>
>>>>     I do not have time to read this thoroughly, so perhaps what
>>>>     I'll mention here is irrelevant. There may be some
>>>>     miscommunication about what is meant by "version". There are
>>>>     two cases to consider:
>>>>
>>>>     1. Before a dataset has become official (i.e., assigned a DOI),
>>>>     a group may choose to remove all record of it from the database
>>>>     and publish a replacement version.
>>>>
>>>>     2. Alternatively, if a group wants to preserve a previous
>>>>     version (as is required after a DOI has been assigned), then
>>>>     the new version will not "replace" the previous version, but
>>>>     simply be added to the archive.
>>>>
>>>>     It is possible that different publication procedures will apply
>>>>     in these different cases.
>>>>
>>>>     best,
>>>>     Karl
>>>>
>>>>     On 1/9/12 4:26 AM, Estanislao Gonzalez wrote:
>>>>>     Just to mentioned that we do the same thing. We use directly
>>>>>     --new-version and a map file containing all files for the new version,
>>>>>     but we do create hard-links to the files being reused, so they are
>>>>>     indeed all "new" as their paths always differ from those of previous
>>>>>     versions. (In any case for the publisher they are the same and thus
>>>>>     encode them with the nc_0 name if I recall correctly)
>>>>>
>>>>>     Thanks,
>>>>>     Estani
>>>>>     Am 09.01.2012 12:15, schrieb stephen.pascoe at stfc.ac.uk:
>>>>>>     Hi Bob,
>>>>>>
>>>>>>     This "unpublish first" requirement is news to me.  We've been publishing new versions without doing this for some time.  Now, we have come across difficulties with a few datasets but it's generally worked.
>>>>>>
>>>>>>     We don't use the --update option though.  Each time we publish a new version we provide a mapfile of all files in the dataset(s).  I'd recommend Sergey try doing this before removing a previous version.
>>>>>>
>>>>>>     If you unpublish from the Gateway first you'll loose the information in the "History" tab.  For instance http://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output2.MOHC.HadGEM2-ES.rcp85.mon.aerosol.aero.r1i1p1.html shows 2 versions.
>>>>>>
>>>>>>     Stephen.
>>>>>>
>>>>>>     ---
>>>>>>     Stephen Pascoe  +44 (0)1235 445980
>>>>>>     Centre of Environmental Data Archival
>>>>>>     STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 0QX, UK
>>>>>>
>>>>>>
>>>>>>     -----Original Message-----
>>>>>>     From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
>>>>>>     Sent: 06 January 2012 20:53
>>>>>>     To: Serguei Nikonov; Eric Nienhouse
>>>>>>     Cc: go-essp-tech at ucar.edu
>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re: Publishing dataset with option --update
>>>>>>
>>>>>>     Hi Sergey,
>>>>>>
>>>>>>     When updating a dataset, it's also important to unpublish it before publishing the new version. E.g, first run
>>>>>>
>>>>>>     esgunpublish<dataset_id>
>>>>>>
>>>>>>     The reason is that, when you publish to the gateway, the gateway software tries to *add* the new information to the existing dataset entry, rather that replace it.
>>>>>>
>>>>>>     --Bob
>>>>>>     ________________________________________
>>>>>>     From: Serguei Nikonov [serguei.nikonov at noaa.gov]
>>>>>>     Sent: Friday, January 06, 2012 10:45 AM
>>>>>>     To: Eric Nienhouse
>>>>>>     Cc: Bob Drach; go-essp-tech at ucar.edu
>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset with option --update
>>>>>>
>>>>>>     Hi Eric,
>>>>>>
>>>>>>     thanks for you help. I have no any objections about any adopted versioning
>>>>>>     policy. What I need is to know how to apply it. The ways I used did not work for
>>>>>>     me. Hopefully, the reasons is bad things in thredds and database you pointed
>>>>>>     put. I am cleaning them right now, then will see...
>>>>>>
>>>>>>     Just for clarification, if I need to update dataset (with changing version) I
>>>>>>     create map file containing full set of files (old and new ones) and then use
>>>>>>     this map file in esgpublish script with option --update, is it correct? Will it
>>>>>>     be enough for creating dataset of new version? BTW, there is nothing about
>>>>>>     version for option 'update' in esgpublish help.
>>>>>>
>>>>>>     Thanks,
>>>>>>     Sergey
>>>>>>
>>>>>>
>>>>>>
>>>>>>     On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
>>>>>>>     Hi Serguei,
>>>>>>>
>>>>>>>     Following are a few more suggestions to diagnose this publishing issue. I agree
>>>>>>>     with others on this thread that adding new files (or changing existing ones)
>>>>>>>     should always trigger a new dataset version.
>>>>>>>
>>>>>>>     It does not appear you are receiving a final "SUCCESS" or failure message when
>>>>>>>     publishing to the Gateway (with esgpublish --publish). Please try increasing
>>>>>>>     your "polling" levels in your $ESGINI file. Eg:
>>>>>>>
>>>>>>>     hessian_service_polling_delay = 10
>>>>>>>     hessian_service_polling_iterations = 500
>>>>>>>
>>>>>>>     You should see a final "SUCCESS" or "ERROR" with Java trace output at the
>>>>>>>     termination of the command.
>>>>>>>
>>>>>>>     I've reviewed the Thredds catalog for the dataset you note below:
>>>>>>>
>>>>>>>
>>>>>>>     http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>>>>>>>
>>>>>>>
>>>>>>>     There appear to be multiple instances of certain files within the catalog which
>>>>>>>     is a problem. The Gateway publish will fail if a particular file (URL) is
>>>>>>>     referenced multiple times with differing metadata. An example is:
>>>>>>>
>>>>>>>
>>>>>>>     */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>>>>>>>
>>>>>>>
>>>>>>>     This file appears as two separate file versions in the Thredds catalog (one with
>>>>>>>     id ending in ".nc" and another with ".nc_0"). There should be only one reference
>>>>>>>     to this file URL in the catalog.
>>>>>>>
>>>>>>>     The previous version of the dataset in the publisher/node database may be
>>>>>>>     leading to this issue. You may need to add "--database-delete" to your
>>>>>>>     esgunpublish command to clean things up. Bob can advise on this. Note that the
>>>>>>>     original esgpublish command shown in this email thread included "--keep-version".
>>>>>>>
>>>>>>>     After publishing to the Gateway successfully, you can check the dataset details
>>>>>>>     by URL with the published dataset identifier. For example:
>>>>>>>
>>>>>>>
>>>>>>>     http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.html
>>>>>>>
>>>>>>>
>>>>>>>     I hope this helps.
>>>>>>>
>>>>>>>     Regards,
>>>>>>>
>>>>>>>     -Eric
>>>>>>>
>>>>>>>     Serguei Nikonov wrote:
>>>>>>>>     Hi Bob,
>>>>>>>>
>>>>>>>>     I still can not do anything about updating datasets. The commands you
>>>>>>>>     suggested executed successfully but datasets did not appear on gateway. I
>>>>>>>>     tried it several times for different datasets but result is the same.
>>>>>>>>
>>>>>>>>     Do you have any idea what to undertake in such situation.
>>>>>>>>
>>>>>>>>     Here it is some details about what I tried.
>>>>>>>>     I needed to add file to dataset
>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.
>>>>>>>>     As you advised I unpublished it (esgunpublish
>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) and then
>>>>>>>>     created full mapfile (with additional file) and then publised it:
>>>>>>>>     esgpublish --read-files --map new_mapfile --project cmip5 --thredd --publish
>>>>>>>>
>>>>>>>>     As I told there were no any errors. Dataset is in database and in thredds but
>>>>>>>>     not in gateway.
>>>>>>>>
>>>>>>>>     The second way I tried is using mapfile containing only files to update. I
>>>>>>>>     needed to substitute several existing files in dataset for new ones. I created
>>>>>>>>     mapfile with only files needed to substitute:
>>>>>>>>     esgscan_directory --read-files --project cmip5 -o mapfile.txt
>>>>>>>>     /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v20111206
>>>>>>>>
>>>>>>>>     and then published it with update option:
>>>>>>>>     esgpublish --update --map mapfile.txt --project cmip5 --thredd --publish.
>>>>>>>>
>>>>>>>>     The result is the same as in a previous case - all things are fine locally but
>>>>>>>>     nothing happened on gateway.
>>>>>>>>
>>>>>>>>     Thanks,
>>>>>>>>     Sergey
>>>>>>>>
>>>>>>>>     -------- Original Message --------
>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>     Date: Thu, 29 Dec 2011 11:02:05 -0500
>>>>>>>>     From: Serguei Nikonov<Serguei.Nikonov at noaa.gov>
>>>>>>>>     Organization: GFDL
>>>>>>>>     To: Drach, Bob<drach1 at llnl.gov>
>>>>>>>>     CC: Nathan Wilhelmi<wilhelmi at ucar.edu>, "Ganzberger, Michael"
>>>>>>>>     <Ganzberger1 at llnl.gov>, "go-essp-tech at ucar.edu"<go-essp-tech at ucar.edu>
>>>>>>>>
>>>>>>>>     Hi Bob,
>>>>>>>>
>>>>>>>>     I tried the 1st way you suggested and it worked partially - the dataset was
>>>>>>>>     created om datanode with version 2 but it was not popped up on gateway. To make
>>>>>>>>     sure that it's not occasional result I repeated it with another datasets with
>>>>>>>>     the same result.
>>>>>>>>     Now I have 2 datasets on datanode (visible in thredds server) but they are
>>>>>>>>     absent on gateway:
>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2
>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2.
>>>>>>>>
>>>>>>>>     Does it make sense to repeat esgpublish with 'publish' option?
>>>>>>>>
>>>>>>>>     Thanks and Happy New Year,
>>>>>>>>     Sergey
>>>>>>>>
>>>>>>>>     On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>>>>>>>>     Hi Sergey,
>>>>>>>>>
>>>>>>>>>     The way I would recommend adding new files to an existing dataset is as
>>>>>>>>>     follows:
>>>>>>>>>
>>>>>>>>>     - Unpublish the previous dataset from the gateway and thredds
>>>>>>>>>
>>>>>>>>>     % esgunpublish
>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1
>>>>>>>>>
>>>>>>>>>     - Add the new files to the existing mapfile for the dataset they are being
>>>>>>>>>     added to.
>>>>>>>>>
>>>>>>>>>     - Republish with the expanded mapfile:
>>>>>>>>>
>>>>>>>>>     % esgpublish --read-files --map newmap.txt --project cmip5 --thredds
>>>>>>>>>     --publish
>>>>>>>>>
>>>>>>>>>     The publisher will:
>>>>>>>>>     - not rescan existing files, only the new files
>>>>>>>>>     - create a new version to reflect the additional files
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Alternatively you can create a mapfile with *only* the new files (Using
>>>>>>>>>     esgscan_directory), then republish using the --update command.
>>>>>>>>>
>>>>>>>>>     --Bob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     On 12/21/11 8:40 AM, "Serguei Nikonov"<serguei.nikonov at noaa.gov>  wrote:
>>>>>>>>>
>>>>>>>>>>     Hi Nate,
>>>>>>>>>>
>>>>>>>>>>     unfortunately this is not the only dataset I have a problem - there are at
>>>>>>>>>>     least
>>>>>>>>>>     5 more. Should I unpublish them locally (db, thredds) and than create new
>>>>>>>>>>     version containing full set of files? What is the official way to update
>>>>>>>>>>     dataset?
>>>>>>>>>>
>>>>>>>>>>     Thanks,
>>>>>>>>>>     Sergey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>>>>>>>>     Hi Bob/Mike,
>>>>>>>>>>>
>>>>>>>>>>>     I believe the problem is that when files were added the timestamp on the
>>>>>>>>>>>     dataset
>>>>>>>>>>>     wasn't updated.
>>>>>>>>>>>
>>>>>>>>>>>     The triple store will only harvest datasets that have files and an updated
>>>>>>>>>>>     timestamp after the last harvest.
>>>>>>>>>>>
>>>>>>>>>>>     So what likely happened is the dataset was created without files, so it
>>>>>>>>>>>     wasn't
>>>>>>>>>>>     initially harvested. Files were subsequently added, but the timestamp wasn't
>>>>>>>>>>>     updated, so it was still not a candidate for harvesting.
>>>>>>>>>>>
>>>>>>>>>>>     Can you update the date_updated timestamp for the dataset in question and
>>>>>>>>>>>     then
>>>>>>>>>>>     trigger the RDF harvesting, I believe the dataset will show up then.
>>>>>>>>>>>
>>>>>>>>>>>     Thanks!
>>>>>>>>>>>     -Nate
>>>>>>>>>>>
>>>>>>>>>>>     On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>
>>>>>>>>>>>>     I am a member of data publishers group. I have been publishing considerable
>>>>>>>>>>>>     amount of data without such kind of troubles but this one occurred only when
>>>>>>>>>>>>     I
>>>>>>>>>>>>     tried to add some files to existing dataset. Publishing from scratch works
>>>>>>>>>>>>     fine
>>>>>>>>>>>>     for me.
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>
>>>>>>>>>>>>     On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     That task is on a scheduler and will re-run every 10 minutes. If your data
>>>>>>>>>>>>>     does not appear after that time then perhaps there is another issue. One
>>>>>>>>>>>>>     issue could be that publishing to the gateway requires that you have the
>>>>>>>>>>>>>     role
>>>>>>>>>>>>>     of "Data Publisher";
>>>>>>>>>>>>>
>>>>>>>>>>>>>     "check that the account is member of the proper group and has the special
>>>>>>>>>>>>>     role of Data Publisher"
>>>>>>>>>>>>>
>>>>>>>>>>>>>     http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>     From: Serguei Nikonov [mailto:serguei.nikonov at noaa.gov]
>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>>>>>>>>     To: Ganzberger, Michael
>>>>>>>>>>>>>     Cc: StИphane Senesi; Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     thansk for suggestion but I don't have any privileges to do anything on
>>>>>>>>>>>>>     gateway.
>>>>>>>>>>>>>     I am just publishing data on GFDL data node.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>
>>>>>>>>>>>>>     On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     I'd like to suggest this that may help you from
>>>>>>>>>>>>>>     http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     "The search does not reflect the latest DB changes I've made
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     You have to manually trigger the 3store harvesting. Logging as root and go
>>>>>>>>>>>>>>     to Admin->"Gateway Scheduled Tasks"->"Run tasks" and restart the job named
>>>>>>>>>>>>>>     RDFSynchronizationJobDetail"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Mike Ganzberger
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>>     From: go-essp-tech-bounces at ucar.edu [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>>>>>>>>     On Behalf Of StИphane Senesi
>>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>>>>>>>>     To: Serguei Nikonov
>>>>>>>>>>>>>>     Cc: Drach, Bob; go-essp-tech at ucar.edu
>>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with option --update
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Serguei
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     We have for some time now experienced similar problems when publishing
>>>>>>>>>>>>>>     to the PCMDI gateway, i.e. not getting a "SUCCESS" message when
>>>>>>>>>>>>>>     publishing . Sometimes, files are actually published (or at least
>>>>>>>>>>>>>>     accessible through the gateway, their status being actually
>>>>>>>>>>>>>>     "START_PUBLISHING", after esg_list_datasets report) , sometimes not. An
>>>>>>>>>>>>>>     hypothesis is that the PCMDI Gateway load do generate the problem. We
>>>>>>>>>>>>>>     havn't yet got a confirmation by Bob.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     In contrast to your case, this happens when publishing a dataset from
>>>>>>>>>>>>>>     scratch (I mean, not an update)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Best regards (do not expect any feeback from me since early january, yet)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     S
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>>>>>>>>     Hi Bob,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     I needed to add some missed variables to existing dataset and I found in
>>>>>>>>>>>>>>>     esgpublish command an option --update. When I tried it I've got normal
>>>>>>>>>>>>>>>     message like
>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>>>>>>>>     cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1, parent
>>>>>>>>>>>>>>>     =
>>>>>>>>>>>>>>>     pcmdi.GFDL
>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>>>>>>>>     ....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     but nothing happened on gateway - new variables are not there. The files
>>>>>>>>>>>>>>>     corresponding to these variables are in database and in THREDDS catalog
>>>>>>>>>>>>>>>     but
>>>>>>>>>>>>>>>     apparently were not published on gateway.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     I used command line
>>>>>>>>>>>>>>>     esgpublish --update --keep-version --map<map_file>  --project cmip5
>>>>>>>>>>>>>>>     --noscan
>>>>>>>>>>>>>>>     --publish.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Should map file be of some specific format to make it works in mode I
>>>>>>>>>>>>>>>     need?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>>     Sergey Nikonov
>>>>>>>>>>>>>>>     GFDL
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>     _______________________________________________
>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>     _______________________________________________
>>>>>>     GO-ESSP-TECH mailing list
>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>     --
>>>>>     Estanislao Gonzalez
>>>>>
>>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>>
>>>>>     Phone:   +49 (40) 46 00 94-126
>>>>>     E-Mail:  gonzalez at dkrz.de
>>>>>
>>>>>     _______________________________________________
>>>>>     GO-ESSP-TECH mailing list
>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>
>>>
>>>     -- 
>>>     Estanislao Gonzalez
>>>
>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>
>>>     Phone:   +49 (40) 46 00 94-126
>>>     E-Mail:  gonzalez at dkrz.de 
>
>
>     -- 
>     Estanislao Gonzalez
>
>     Max-Planck-Institut für Meteorologie (MPI-M)
>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>
>     Phone:   +49 (40) 46 00 94-126
>     E-Mail:  gonzalez at dkrz.de 
>


-- 
Estanislao Gonzalez

Max-Planck-Institut für Meteorologie (MPI-M)
Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany

Phone:   +49 (40) 46 00 94-126
E-Mail:  gonzalez at dkrz.de 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ucar.edu/pipermail/go-essp-tech/attachments/20120110/0445c2e8/attachment-0001.html 


More information about the GO-ESSP-TECH mailing list