[Go-essp-tech] Fwd: Re: Visibility of old versions, was... RE: Fwd: Re: Publishing dataset with option --update

Martina Stockhause stockhause at dkrz.de
Wed Jan 11 07:20:14 MST 2012


  Hallo Luca,

please access our atom feed with the CIM quality documents at: 
http://cera-www.dkrz.de/WDCC/CMIP5/feed/
Every time an assignment of QC level 2 or QC level 3 is done, a new 
entry is added to the feed.

Best wishes,
Martina


On 11.01.2012 14:10, Michael Lautenschlager wrote:
> Hello Martina,
> could you please provide Lucca with the requested information with 
> copy to the go-essp-tech list.
> Thanks, Michael
>
>
> -------- Original-Nachricht --------
> Betreff:     Re: [Go-essp-tech] Visibility of old versions, was... RE: 
> Fwd: Re: Publishing dataset with option --update
> Datum:     Wed, 11 Jan 2012 04:18:38 -0800
> Von:     Cinquini, Luca (3880) <Luca.Cinquini at jpl.nasa.gov>
> An:     Michael Lautenschlager <lautenschlager at dkrz.de>
> Kopie (CC):     Karl Taylor <taylor13 at llnl.gov>, 
> "go-essp-tech at ucar.edu" <go-essp-tech at ucar.edu>, "Drach, Bob" 
> <drach1 at llnl.gov>, "serguei.nikonov at noaa.gov" <serguei.nikonov at noaa.gov>
>
>
>
> Hi Michael,
>         sorry if I should know this already, but how can we access the 
> DOI information for a given dataset ? The goal is, off course, to 
> enable search on DOIs in the P2P system.
> thanks, Luca
>
> On Jan 11, 2012, at 1:37 AM, Michael Lautenschlager wrote:
>
>>  Hi Karl,
>>  even with respect to IPCC DDC I think we have to keep at least the most
>>  recent version of CMIP5 and those versions which ran through QC-L3 with
>>  assignment of DOI and citation reference. At least we WDCC/DKRZ are in
>>  contract with DataCite to keep these DataCite published data entries
>>  forever in the sense of common library time scales. The number and
>>  location of replicas is decidable within CMIP5 and no matter for
>>  DataCite but we have to ensure identical copies if we link to them from
>>  the DOI landing page.
>>
>>  These DataCite published CMIP5 data entities may form the GCM data 
>> basis
>>  of the IPCC DDC because they are stable, quality proofed, accessible at
>>  any time and have a citation reference. So these data entities can be
>>  traced back in the scientific literature providing the citation
>>  references are used there. But I agree we have to discuss this with the
>>  IPCC DDC people.
>>
>>  Best wishes, Michael
>>
>>  ---------------
>>  Dr. Michael Lautenschlager
>>  Head of DKRZ Department Data Management
>>  Director World Data Center Climate
>>
>>  German Climate Computing Centre (DKRZ)
>>  ADDRESS: Bundesstrasse 45a, D-20146 Hamburg, Germany
>>  PHONE:   +4940-460094-118
>>  E-Mail:  lautenschlager at dkrz.de
>>
>>  URL:    http://www.dkrz.de/
>>          http://www.wdc-climate.de/
>>
>>
>>  Geschäftsführer: Prof. Dr. Thomas Ludwig
>>  Sitz der Gesellschaft: Hamburg
>>  Amtsgericht Hamburg HRB 39784
>>
>>
>>  Am 10.01.2012 16:40, schrieb Karl Taylor:
>>>  Hi all,
>>>
>>>  thanks for the good discussion. Some good arguments have been made for
>>>  keeping all versions. I'll not make a policy decision immediately, but
>>>  am tending toward strong encouragement to keep all versions. I'll
>>>  distribute a draft statement about this for your input and comment
>>>  before posting. I'll, of course, also consult directly with other IPCC
>>>  DDC folks too.
>>>
>>>  Best regards,
>>>  Karl
>>>
>>>  On 1/10/12 5:31 AM, Estanislao Gonzalez wrote:
>>>>  Hi Jamie,
>>>>
>>>>  Indeed, DOIs are not going to solve everything. The DOI is analogous
>>>>  to the ISBN of a book, citing the whole book is not always what you
>>>>  want in any case. To continue the analogy, users are indeed working
>>>>  with pre-prints which get corrected all the time (i.e. the CMIP5
>>>>  archive is in flux). People are writing papers citing a pre-print. Of
>>>>  course this makes no sense, but they are not doing so willingly, they
>>>>  have to as the dead line approaches but the computing groups are not
>>>>  ready.
>>>>
>>>>  So what do we have now? Some archives with a strong commitment for
>>>>  preserving data.
>>>>  If the DRS were honored, the URL would be enough for citing any file
>>>>  as it has the version in it. Indeed citing +1000 Urls is not
>>>>  practical, but a redirection could be added so that the scientist
>>>>  cites one URL in which all files URLs are listed (There's no
>>>>  implementation for this AFAIK). But at least the URL of DRS committed
>>>>  sites could be safely cited, and if the checksum is attached to the
>>>>  citation, it is sure that the correct file is always being cited (and
>>>>  it could even be found if moved).
>>>>
>>>>  I don't know how citations are being done now, nor do I know how they
>>>>  were done before when everyone was citing data that it was almost
>>>>  impossible to get. DOIs are the very first step in the right
>>>>  direction, not the last one.
>>>>  IMHO the community should come up with some best practices to
>>>>  overcome the problem we are facing: how to cite something that's
>>>>  permanently changing. Sharing this will certainly help everyone.
>>>>  Before jumping away from this subject I'd also like to add that I
>>>>  don't see any proper communication mechanism in the community. All
>>>>  (or at least most) questions regarding CMIP5 are AFAICT directed to
>>>>  the help-desk, so mostly developers are trying to help the community
>>>>  instead of the community trying to help itself. I think we might be
>>>>  missing some kind of platform for doing this. We don't have the means
>>>>  to support the growing community (and new communities which we are
>>>>  now serving), we need them to help with the "helping". Just a 
>>>> thought....
>>>>
>>>>  And last, and probably least, the only way to get the latest version
>>>>  of any dataset is by re-issuing the search. Especially since multiple
>>>>  datasets are referred to in a wget script, finding the latest
>>>>  versions of each of them "by hand" will be more time-consuming than
>>>>  issuing the search query again.
>>>>
>>>>  Thanks,
>>>>  Estani
>>>>
>>>>  Am 10.01.2012 13:00, schrieb Kettleborough, Jamie:
>>>>>  Hello,
>>>>>  I'm not sure how to say this: but I'm not sure its just down to
>>>>>  DOI's to determine whether a data set should always be visible. I
>>>>>  think data needs to be visible where its sufficiently important that
>>>>>  a user might want to download it. e.g they want to check or extend
>>>>>  someone elses study (and I think there are other reasons). Its not
>>>>>  clear to me that all data of this kind will have a DOI - for
>>>>>  instance how many of the datasets referenced in papers being written
>>>>>  now for the summer deadline of AR5 have (or will have in time) DOIs?
>>>>>  I know its tempting to say - any dataset referenced in a paper
>>>>>  should have a DOI. But Ithink you need to be realistic about the
>>>>>  prospects of this happening on the right timescales.
>>>>>  If the DOI is used as the determinent of whether data is always
>>>>>  visible then should users be made aware of the risk they are
>>>>>  carrying now? For instance, so they know to have local backups of
>>>>>  data that is really important to them. (With the possible
>>>>>  implication too that they may need to be prepared to 'reshare' this
>>>>>  data with others.)
>>>>>  For what its worth my personal preference is with the BADC/DKRZ (and
>>>>>  I'm sure others) philosophy of keeping all versions - though I
>>>>>  realise there are costs in doing this, like getting DRSlib
>>>>>  sufficiently bug free and getting it to work in all the contexts it
>>>>>  needs to (hard links/soft links), getting it deployed, getting the
>>>>>  update mechanism in place for when new bugs are found etc. If you
>>>>>  used DRSlib doesn't Estanis use case that caused user grief become
>>>>>  easier too - the wget scripts do not need regenerating, you should
>>>>>  instead be able to replace the version strings in the url (though I
>>>>>  may be assuming things about load balancing etc in saying this).
>>>>>  Jamie
>>>>>
>>>>>     
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>>     *From:* go-essp-tech-bounces at ucar.edu
>>>>>     [mailto:go-essp-tech-bounces at ucar.edu] *On Behalf Of *Estanislao
>>>>>     Gonzalez
>>>>>     *Sent:* 10 January 2012 10:21
>>>>>     *To:* Karl Taylor
>>>>>     *Cc:* Drach, Bob; go-essp-tech at ucar.edu; serguei.nikonov at noaa.gov
>>>>>     *Subject:* Re: [Go-essp-tech] Fwd: Re: Publishing dataset with
>>>>>     option --update
>>>>>
>>>>>     Well to be honest I do agree this is a decision each institution
>>>>>     has to make, but for us I'd prefer offering everything we have
>>>>>     and let the systems decide what to do with this information.
>>>>>     I.e. I've used it to generate some comments (I might have
>>>>>     already show you this), just go here:
>>>>>     
>>>>> http://ipcc-ar5.dkrz.de/dataset/cmip5.output1.NCC.NorESM1-M.sstClim.mon.land.Lmon.r1i1p1.html
>>>>>     and click on history.
>>>>>     That information could be generated only because we store the
>>>>>     metadata to the previous version.
>>>>>
>>>>>     By the way, The only way of inhibiting the user from getting an
>>>>>     older version, if that's what it's wanted, is by either removing
>>>>>     the files from the TDS served directory, or changing the access
>>>>>     restriction at the Gateway. Because of a well-known TDS bug (or
>>>>>     feature) files present at that directory and not found in any
>>>>>     catalog are served without any restriction (AFAIK no certificate
>>>>>     is required for this). So, normally the wget script would work
>>>>>     even if the files where unpublished.
>>>>>
>>>>>     It really depends on the use-case... but e.g. I had to explain
>>>>>     all this to a couple of people in the help-desk since the wget
>>>>>     script they've downloaded wasn't working anymore (files were
>>>>>     removed). They weren't thrilled to know they had to re issue the
>>>>>     search again (there's no workaround for this) and they wanted to
>>>>>     know what was changed in the new version, and there's where we
>>>>>     can't help our users any more since we don't have that
>>>>>     information...
>>>>>
>>>>>     I don't know what our users prefer, but I think they have more
>>>>>     important problems to cope with at this time... if they could
>>>>>     reliably get one version they could start worrying about others.
>>>>>      From my perspective as a data manager, it's worth the tiny
>>>>>     additional effort, if there's any.
>>>>>
>>>>>     Cheers,
>>>>>     Estani
>>>>>
>>>>>     Am 09.01.2012 20:05, schrieb Karl Taylor:
>>>>>>     Hi Estani,
>>>>>>
>>>>>>     I agree that a new version number should (I'd say must) be
>>>>>>     assigned when any changes are made. However, except for DOI
>>>>>>     datasets, most groups will not want older versions to be
>>>>>>     visible or downloadable.
>>>>>>
>>>>>>     Do you agree?
>>>>>>
>>>>>>     cheers,
>>>>>>     Karl
>>>>>>
>>>>>>     On 1/9/12 10:37 AM, Estanislao Gonzalez wrote:
>>>>>>>     Hi Karl,
>>>>>>>
>>>>>>>     It is indeed a good point, but I must add that we are not
>>>>>>>     talking about preserving a version (although we do it here at
>>>>>>>     DKRZ) but of signaling that a version has been changed. So the
>>>>>>>     version is a key to find a specific dataset which changes in 
>>>>>>> time.
>>>>>>>
>>>>>>>     Even before a DOI assignment I'd encourage all to create a new
>>>>>>>     version every time the dataset changes in any way.
>>>>>>>     Institutions have the right to preserve whatever version they
>>>>>>>     want (they may even delete DOI-assigned versions, on the other
>>>>>>>     hand archives can't, that's why archives are for).
>>>>>>>     But altering the dataset preserving the version just bring
>>>>>>>     chaos for the users and for us at the help-desk as we have to
>>>>>>>     explain why something has changed (or rather answer that we
>>>>>>>     don't know why...). It means that the same key now points to a
>>>>>>>     different dataset.
>>>>>>>
>>>>>>>     The only benefits I can see for preserving the same version is
>>>>>>>     that publishing using the same version seems to be easier to
>>>>>>>     some (for our workflow it's not, it's exactly the same) and
>>>>>>>     that if only new files are added this seems to work fine for
>>>>>>>     publication at both the data-node and the gateway as it's
>>>>>>>     properly supported.
>>>>>>>     If anything else changes, this does not work as expected
>>>>>>>     (wrong checksums, ghost files at the gateway, etc). And
>>>>>>>     changing a version contents makes no sense to the user IMHO
>>>>>>>     (e.g. it's as if you might sometimes get more files from a
>>>>>>>     tarred file... how often should you extract it to be sure you
>>>>>>>     got "all of them")
>>>>>>>
>>>>>>>     If old versions were preserved (which take almost no resources
>>>>>>>     if using hardlinks), a simple comparison would tell that the
>>>>>>>     only changes were the addition of some specific files.
>>>>>>>
>>>>>>>     Basically, reusing the version ends in a non-recoverable loss
>>>>>>>     of information. That's why I discourage it.
>>>>>>>
>>>>>>>     My 2c,
>>>>>>>     Estani
>>>>>>>
>>>>>>>     Am 09.01.2012 17:25, schrieb Karl Taylor:
>>>>>>>>     Dear all,
>>>>>>>>
>>>>>>>>     I do not have time to read this thoroughly, so perhaps what
>>>>>>>>     I'll mention here is irrelevant. There may be some
>>>>>>>>     miscommunication about what is meant by "version". There are
>>>>>>>>     two cases to consider:
>>>>>>>>
>>>>>>>>     1. Before a dataset has become official (i.e., assigned a
>>>>>>>>     DOI), a group may choose to remove all record of it from the
>>>>>>>>     database and publish a replacement version.
>>>>>>>>
>>>>>>>>     2. Alternatively, if a group wants to preserve a previous
>>>>>>>>     version (as is required after a DOI has been assigned), then
>>>>>>>>     the new version will not "replace" the previous version, but
>>>>>>>>     simply be added to the archive.
>>>>>>>>
>>>>>>>>     It is possible that different publication procedures will
>>>>>>>>     apply in these different cases.
>>>>>>>>
>>>>>>>>     best,
>>>>>>>>     Karl
>>>>>>>>
>>>>>>>>     On 1/9/12 4:26 AM, Estanislao Gonzalez wrote:
>>>>>>>>>     Just to mentioned that we do the same thing. We use directly
>>>>>>>>>     --new-version and a map file containing all files for the 
>>>>>>>>> new version,
>>>>>>>>>     but we do create hard-links to the files being reused, so 
>>>>>>>>> they are
>>>>>>>>>     indeed all "new" as their paths always differ from those 
>>>>>>>>> of previous
>>>>>>>>>     versions. (In any case for the publisher they are the same 
>>>>>>>>> and thus
>>>>>>>>>     encode them with the nc_0 name if I recall correctly)
>>>>>>>>>
>>>>>>>>>     Thanks,
>>>>>>>>>     Estani
>>>>>>>>>     Am 09.01.2012 12:15, schriebstephen.pascoe at stfc.ac.uk:
>>>>>>>>>>     Hi Bob,
>>>>>>>>>>
>>>>>>>>>>     This "unpublish first" requirement is news to me.  We've 
>>>>>>>>>> been publishing new versions without doing this for some 
>>>>>>>>>> time.  Now, we have come across difficulties with a few 
>>>>>>>>>> datasets but it's generally worked.
>>>>>>>>>>
>>>>>>>>>>     We don't use the --update option though.  Each time we 
>>>>>>>>>> publish a new version we provide a mapfile of all files in 
>>>>>>>>>> the dataset(s).  I'd recommend Sergey try doing this before 
>>>>>>>>>> removing a previous version.
>>>>>>>>>>
>>>>>>>>>>     If you unpublish from the Gateway first you'll loose the 
>>>>>>>>>> information in the "History" tab.  For 
>>>>>>>>>> instancehttp://cmip-gw.badc.rl.ac.uk/dataset/cmip5.output2.MOHC.HadGEM2-ES.rcp85.mon.aerosol.aero.r1i1p1.html  
>>>>>>>>>> shows 2 versions.
>>>>>>>>>>
>>>>>>>>>>     Stephen.
>>>>>>>>>>
>>>>>>>>>>     ---
>>>>>>>>>>     Stephen Pascoe  +44 (0)1235 445980
>>>>>>>>>>     Centre of Environmental Data Archival
>>>>>>>>>>     STFC Rutherford Appleton Laboratory, Harwell Oxford, 
>>>>>>>>>> Didcot OX11 0QX, UK
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>     From:go-essp-tech-bounces at ucar.edu  
>>>>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu] On Behalf Of Drach, Bob
>>>>>>>>>>     Sent: 06 January 2012 20:53
>>>>>>>>>>     To: Serguei Nikonov; Eric Nienhouse
>>>>>>>>>>     Cc:go-essp-tech at ucar.edu
>>>>>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re: Publishing dataset 
>>>>>>>>>> with option --update
>>>>>>>>>>
>>>>>>>>>>     Hi Sergey,
>>>>>>>>>>
>>>>>>>>>>     When updating a dataset, it's also important to unpublish 
>>>>>>>>>> it before publishing the new version. E.g, first run
>>>>>>>>>>
>>>>>>>>>>     esgunpublish<dataset_id>
>>>>>>>>>>
>>>>>>>>>>     The reason is that, when you publish to the gateway, the 
>>>>>>>>>> gateway software tries to *add* the new information to the 
>>>>>>>>>> existing dataset entry, rather that replace it.
>>>>>>>>>>
>>>>>>>>>>     --Bob
>>>>>>>>>>     ________________________________________
>>>>>>>>>>     From: Serguei Nikonov [serguei.nikonov at noaa.gov]
>>>>>>>>>>     Sent: Friday, January 06, 2012 10:45 AM
>>>>>>>>>>     To: Eric Nienhouse
>>>>>>>>>>     Cc: Bob Drach;go-essp-tech at ucar.edu
>>>>>>>>>>     Subject: Re: [Go-essp-tech] Fwd: Re:  Publishing dataset 
>>>>>>>>>> with option --update
>>>>>>>>>>
>>>>>>>>>>     Hi Eric,
>>>>>>>>>>
>>>>>>>>>>     thanks for you help. I have no any objections about any 
>>>>>>>>>> adopted versioning
>>>>>>>>>>     policy. What I need is to know how to apply it. The ways 
>>>>>>>>>> I used did not work for
>>>>>>>>>>     me. Hopefully, the reasons is bad things in thredds and 
>>>>>>>>>> database you pointed
>>>>>>>>>>     put. I am cleaning them right now, then will see...
>>>>>>>>>>
>>>>>>>>>>     Just for clarification, if I need to update dataset (with 
>>>>>>>>>> changing version) I
>>>>>>>>>>     create map file containing full set of files (old and new 
>>>>>>>>>> ones) and then use
>>>>>>>>>>     this map file in esgpublish script with option --update, 
>>>>>>>>>> is it correct? Will it
>>>>>>>>>>     be enough for creating dataset of new version? BTW, there 
>>>>>>>>>> is nothing about
>>>>>>>>>>     version for option 'update' in esgpublish help.
>>>>>>>>>>
>>>>>>>>>>     Thanks,
>>>>>>>>>>     Sergey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     On 01/04/2012 04:27 PM, Eric Nienhouse wrote:
>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>
>>>>>>>>>>>     Following are a few more suggestions to diagnose this 
>>>>>>>>>>> publishing issue. I agree
>>>>>>>>>>>     with others on this thread that adding new files (or 
>>>>>>>>>>> changing existing ones)
>>>>>>>>>>>     should always trigger a new dataset version.
>>>>>>>>>>>
>>>>>>>>>>>     It does not appear you are receiving a final "SUCCESS" 
>>>>>>>>>>> or failure message when
>>>>>>>>>>>     publishing to the Gateway (with esgpublish --publish). 
>>>>>>>>>>> Please try increasing
>>>>>>>>>>>     your "polling" levels in your $ESGINI file. Eg:
>>>>>>>>>>>
>>>>>>>>>>>     hessian_service_polling_delay = 10
>>>>>>>>>>>     hessian_service_polling_iterations = 500
>>>>>>>>>>>
>>>>>>>>>>>     You should see a final "SUCCESS" or "ERROR" with Java 
>>>>>>>>>>> trace output at the
>>>>>>>>>>>     termination of the command.
>>>>>>>>>>>
>>>>>>>>>>>     I've reviewed the Thredds catalog for the dataset you 
>>>>>>>>>>> note below:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>> http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2.xml
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     There appear to be multiple instances of certain files 
>>>>>>>>>>> within the catalog which
>>>>>>>>>>>     is a problem. The Gateway publish will fail if a 
>>>>>>>>>>> particular file (URL) is
>>>>>>>>>>>     referenced multiple times with differing metadata. An 
>>>>>>>>>>> example is:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>> */gfdl_dataroot/NOAA-GFDL/GFDL-CM3/historical/mon/atmos/Amon/r1i1p1/v20110601/rtmt/rtmt_Amon_GFDL-CM3_historical_r1i1p1_186001-186412.nc
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     This file appears as two separate file versions in the 
>>>>>>>>>>> Thredds catalog (one with
>>>>>>>>>>>     id ending in ".nc" and another with ".nc_0"). There 
>>>>>>>>>>> should be only one reference
>>>>>>>>>>>     to this file URL in the catalog.
>>>>>>>>>>>
>>>>>>>>>>>     The previous version of the dataset in the 
>>>>>>>>>>> publisher/node database may be
>>>>>>>>>>>     leading to this issue. You may need to add 
>>>>>>>>>>> "--database-delete" to your
>>>>>>>>>>>     esgunpublish command to clean things up. Bob can advise 
>>>>>>>>>>> on this. Note that the
>>>>>>>>>>>     original esgpublish command shown in this email thread 
>>>>>>>>>>> included "--keep-version".
>>>>>>>>>>>
>>>>>>>>>>>     After publishing to the Gateway successfully, you can 
>>>>>>>>>>> check the dataset details
>>>>>>>>>>>     by URL with the published dataset identifier. For example:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     
>>>>>>>>>>> http://pcmdi3.llnl.gov/esgcet/dataset/cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     I hope this helps.
>>>>>>>>>>>
>>>>>>>>>>>     Regards,
>>>>>>>>>>>
>>>>>>>>>>>     -Eric
>>>>>>>>>>>
>>>>>>>>>>>     Serguei Nikonov wrote:
>>>>>>>>>>>>     Hi Bob,
>>>>>>>>>>>>
>>>>>>>>>>>>     I still can not do anything about updating datasets. 
>>>>>>>>>>>> The commands you
>>>>>>>>>>>>     suggested executed successfully but datasets did not 
>>>>>>>>>>>> appear on gateway. I
>>>>>>>>>>>>     tried it several times for different datasets but 
>>>>>>>>>>>> result is the same.
>>>>>>>>>>>>
>>>>>>>>>>>>     Do you have any idea what to undertake in such situation.
>>>>>>>>>>>>
>>>>>>>>>>>>     Here it is some details about what I tried.
>>>>>>>>>>>>     I needed to add file to dataset
>>>>>>>>>>>>     
>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1. 
>>>>>>>>>>>>
>>>>>>>>>>>>     As you advised I unpublished it (esgunpublish
>>>>>>>>>>>>     
>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1) 
>>>>>>>>>>>> and then
>>>>>>>>>>>>     created full mapfile (with additional file) and then 
>>>>>>>>>>>> publised it:
>>>>>>>>>>>>     esgpublish --read-files --map new_mapfile --project 
>>>>>>>>>>>> cmip5 --thredd --publish
>>>>>>>>>>>>
>>>>>>>>>>>>     As I told there were no any errors. Dataset is in 
>>>>>>>>>>>> database and in thredds but
>>>>>>>>>>>>     not in gateway.
>>>>>>>>>>>>
>>>>>>>>>>>>     The second way I tried is using mapfile containing only 
>>>>>>>>>>>> files to update. I
>>>>>>>>>>>>     needed to substitute several existing files in dataset 
>>>>>>>>>>>> for new ones. I created
>>>>>>>>>>>>     mapfile with only files needed to substitute:
>>>>>>>>>>>>     esgscan_directory --read-files --project cmip5 -o 
>>>>>>>>>>>> mapfile.txt
>>>>>>>>>>>>     
>>>>>>>>>>>> /data/CMIP5/output1/NOAA-GFDL/GFDL-ESM2M/historical/mon/ocean/Omon/r1i1p1/v20111206
>>>>>>>>>>>>
>>>>>>>>>>>>     and then published it with update option:
>>>>>>>>>>>>     esgpublish --update --map mapfile.txt --project cmip5 
>>>>>>>>>>>> --thredd --publish.
>>>>>>>>>>>>
>>>>>>>>>>>>     The result is the same as in a previous case - all 
>>>>>>>>>>>> things are fine locally but
>>>>>>>>>>>>     nothing happened on gateway.
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>
>>>>>>>>>>>>     -------- Original Message --------
>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset with 
>>>>>>>>>>>> option --update
>>>>>>>>>>>>     Date: Thu, 29 Dec 2011 11:02:05 -0500
>>>>>>>>>>>>     From: Serguei Nikonov<Serguei.Nikonov at noaa.gov>
>>>>>>>>>>>>     Organization: GFDL
>>>>>>>>>>>>     To: Drach, Bob<drach1 at llnl.gov>
>>>>>>>>>>>>     CC: Nathan Wilhelmi<wilhelmi at ucar.edu>, "Ganzberger, 
>>>>>>>>>>>> Michael"
>>>>>>>>>>>> <Ganzberger1 at llnl.gov>,"go-essp-tech at ucar.edu"<go-essp-tech at ucar.edu>
>>>>>>>>>>>>
>>>>>>>>>>>>     Hi Bob,
>>>>>>>>>>>>
>>>>>>>>>>>>     I tried the 1st way you suggested and it worked 
>>>>>>>>>>>> partially - the dataset was
>>>>>>>>>>>>     created om datanode with version 2 but it was not 
>>>>>>>>>>>> popped up on gateway. To make
>>>>>>>>>>>>     sure that it's not occasional result I repeated it with 
>>>>>>>>>>>> another datasets with
>>>>>>>>>>>>     the same result.
>>>>>>>>>>>>     Now I have 2 datasets on datanode (visible in thredds 
>>>>>>>>>>>> server) but they are
>>>>>>>>>>>>     absent on gateway:
>>>>>>>>>>>>     
>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1.v2 
>>>>>>>>>>>>
>>>>>>>>>>>>     
>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r2i1p1.v2. 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>     Does it make sense to repeat esgpublish with 'publish' 
>>>>>>>>>>>> option?
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks and Happy New Year,
>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>
>>>>>>>>>>>>     On 12/21/2011 08:41 PM, Drach, Bob wrote:
>>>>>>>>>>>>>     Hi Sergey,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     The way I would recommend adding new files to an 
>>>>>>>>>>>>> existing dataset is as
>>>>>>>>>>>>>     follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     - Unpublish the previous dataset from the gateway and 
>>>>>>>>>>>>> thredds
>>>>>>>>>>>>>
>>>>>>>>>>>>>     % esgunpublish
>>>>>>>>>>>>>     
>>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     - Add the new files to the existing mapfile for the 
>>>>>>>>>>>>> dataset they are being
>>>>>>>>>>>>>     added to.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     - Republish with the expanded mapfile:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     % esgpublish --read-files --map newmap.txt --project 
>>>>>>>>>>>>> cmip5 --thredds
>>>>>>>>>>>>>     --publish
>>>>>>>>>>>>>
>>>>>>>>>>>>>     The publisher will:
>>>>>>>>>>>>>     - not rescan existing files, only the new files
>>>>>>>>>>>>>     - create a new version to reflect the additional files
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Alternatively you can create a mapfile with *only* the 
>>>>>>>>>>>>> new files (Using
>>>>>>>>>>>>>     esgscan_directory), then republish using the --update 
>>>>>>>>>>>>> command.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     --Bob
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     On 12/21/11 8:40 AM, "Serguei 
>>>>>>>>>>>>> Nikonov"<serguei.nikonov at noaa.gov>    wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Hi Nate,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     unfortunately this is not the only dataset I have a 
>>>>>>>>>>>>>> problem - there are at
>>>>>>>>>>>>>>     least
>>>>>>>>>>>>>>     5 more. Should I unpublish them locally (db, thredds) 
>>>>>>>>>>>>>> and than create new
>>>>>>>>>>>>>>     version containing full set of files? What is the 
>>>>>>>>>>>>>> official way to update
>>>>>>>>>>>>>>     dataset?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     On 12/20/2011 07:06 PM, Nathan Wilhelmi wrote:
>>>>>>>>>>>>>>>     Hi Bob/Mike,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     I believe the problem is that when files were added 
>>>>>>>>>>>>>>> the timestamp on the
>>>>>>>>>>>>>>>     dataset
>>>>>>>>>>>>>>>     wasn't updated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     The triple store will only harvest datasets that 
>>>>>>>>>>>>>>> have files and an updated
>>>>>>>>>>>>>>>     timestamp after the last harvest.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     So what likely happened is the dataset was created 
>>>>>>>>>>>>>>> without files, so it
>>>>>>>>>>>>>>>     wasn't
>>>>>>>>>>>>>>>     initially harvested. Files were subsequently added, 
>>>>>>>>>>>>>>> but the timestamp wasn't
>>>>>>>>>>>>>>>     updated, so it was still not a candidate for 
>>>>>>>>>>>>>>> harvesting.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Can you update the date_updated timestamp for the 
>>>>>>>>>>>>>>> dataset in question and
>>>>>>>>>>>>>>>     then
>>>>>>>>>>>>>>>     trigger the RDF harvesting, I believe the dataset 
>>>>>>>>>>>>>>> will show up then.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Thanks!
>>>>>>>>>>>>>>>     -Nate
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     On 12/20/2011 11:49 AM, Serguei Nikonov wrote:
>>>>>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     I am a member of data publishers group. I have been 
>>>>>>>>>>>>>>>> publishing considerable
>>>>>>>>>>>>>>>>     amount of data without such kind of troubles but 
>>>>>>>>>>>>>>>> this one occurred only when
>>>>>>>>>>>>>>>>     I
>>>>>>>>>>>>>>>>     tried to add some files to existing dataset. 
>>>>>>>>>>>>>>>> Publishing from scratch works
>>>>>>>>>>>>>>>>     fine
>>>>>>>>>>>>>>>>     for me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     On 12/20/2011 01:29 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     That task is on a scheduler and will re-run every 
>>>>>>>>>>>>>>>>> 10 minutes. If your data
>>>>>>>>>>>>>>>>>     does not appear after that time then perhaps there 
>>>>>>>>>>>>>>>>> is another issue. One
>>>>>>>>>>>>>>>>>     issue could be that publishing to the gateway 
>>>>>>>>>>>>>>>>> requires that you have the
>>>>>>>>>>>>>>>>>     role
>>>>>>>>>>>>>>>>>     of "Data Publisher";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     "check that the account is member of the proper 
>>>>>>>>>>>>>>>>> group and has the special
>>>>>>>>>>>>>>>>>     role of Data Publisher"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     http://esgf.org/wiki/ESGFNode/FAQ
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Mike
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>>>>>     From: Serguei Nikonov 
>>>>>>>>>>>>>>>>> [mailto:serguei.nikonov at noaa.gov]
>>>>>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 10:12 AM
>>>>>>>>>>>>>>>>>     To: Ganzberger, Michael
>>>>>>>>>>>>>>>>>     Cc: StИphane Senesi; Drach, Bob;go-essp-tech at ucar.edu
>>>>>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset 
>>>>>>>>>>>>>>>>> with option --update
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Hi Mike,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     thansk for suggestion but I don't have any 
>>>>>>>>>>>>>>>>> privileges to do anything on
>>>>>>>>>>>>>>>>>     gateway.
>>>>>>>>>>>>>>>>>     I am just publishing data on GFDL data node.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>>>>>>     Sergey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     On 12/20/2011 01:05 PM, Ganzberger, Michael wrote:
>>>>>>>>>>>>>>>>>>     Hi Serguei,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     I'd like to suggest this that may help you from
>>>>>>>>>>>>>>>>>>     http://esgf.org/wiki/Cmip5Gateway/FAQ
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     "The search does not reflect the latest DB 
>>>>>>>>>>>>>>>>>> changes I've made
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     You have to manually trigger the 3store 
>>>>>>>>>>>>>>>>>> harvesting. Logging as root and go
>>>>>>>>>>>>>>>>>>     to Admin->"Gateway Scheduled Tasks"->"Run tasks" 
>>>>>>>>>>>>>>>>>> and restart the job named
>>>>>>>>>>>>>>>>>>     RDFSynchronizationJobDetail"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     Mike Ganzberger
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     -----Original Message-----
>>>>>>>>>>>>>>>>>>     From:go-essp-tech-bounces at ucar.edu  
>>>>>>>>>>>>>>>>>> [mailto:go-essp-tech-bounces at ucar.edu]
>>>>>>>>>>>>>>>>>>     On Behalf Of StИphane Senesi
>>>>>>>>>>>>>>>>>>     Sent: Tuesday, December 20, 2011 9:42 AM
>>>>>>>>>>>>>>>>>>     To: Serguei Nikonov
>>>>>>>>>>>>>>>>>>     Cc: Drach, Bob;go-essp-tech at ucar.edu
>>>>>>>>>>>>>>>>>>     Subject: Re: [Go-essp-tech] Publishing dataset 
>>>>>>>>>>>>>>>>>> with option --update
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     Serguei
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     We have for some time now experienced similar 
>>>>>>>>>>>>>>>>>> problems when publishing
>>>>>>>>>>>>>>>>>>     to the PCMDI gateway, i.e. not getting a 
>>>>>>>>>>>>>>>>>> "SUCCESS" message when
>>>>>>>>>>>>>>>>>>     publishing . Sometimes, files are actually 
>>>>>>>>>>>>>>>>>> published (or at least
>>>>>>>>>>>>>>>>>>     accessible through the gateway, their status 
>>>>>>>>>>>>>>>>>> being actually
>>>>>>>>>>>>>>>>>>     "START_PUBLISHING", after esg_list_datasets 
>>>>>>>>>>>>>>>>>> report) , sometimes not. An
>>>>>>>>>>>>>>>>>>     hypothesis is that the PCMDI Gateway load do 
>>>>>>>>>>>>>>>>>> generate the problem. We
>>>>>>>>>>>>>>>>>>     havn't yet got a confirmation by Bob.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     In contrast to your case, this happens when 
>>>>>>>>>>>>>>>>>> publishing a dataset from
>>>>>>>>>>>>>>>>>>     scratch (I mean, not an update)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     Best regards (do not expect any feeback from me 
>>>>>>>>>>>>>>>>>> since early january, yet)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     S
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     Serguei Nikonov wrote, On 20/12/2011 18:11:
>>>>>>>>>>>>>>>>>>>     Hi Bob,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     I needed to add some missed variables to 
>>>>>>>>>>>>>>>>>>> existing dataset and I found in
>>>>>>>>>>>>>>>>>>>     esgpublish command an option --update. When I 
>>>>>>>>>>>>>>>>>>> tried it I've got normal
>>>>>>>>>>>>>>>>>>>     message like
>>>>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:00,893 Publishing:
>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>> cmip5.output1.NOAA-GFDL.GFDL-CM3.historical.mon.atmos.Amon.r1i1p1, 
>>>>>>>>>>>>>>>>>>> parent
>>>>>>>>>>>>>>>>>>>     =
>>>>>>>>>>>>>>>>>>>     pcmdi.GFDL
>>>>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:07,564 Result: PROCESSING
>>>>>>>>>>>>>>>>>>>     INFO 2011-12-20 11:21:11,209 Result: PROCESSING
>>>>>>>>>>>>>>>>>>>     ....
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     but nothing happened on gateway - new variables 
>>>>>>>>>>>>>>>>>>> are not there. The files
>>>>>>>>>>>>>>>>>>>     corresponding to these variables are in database 
>>>>>>>>>>>>>>>>>>> and in THREDDS catalog
>>>>>>>>>>>>>>>>>>>     but
>>>>>>>>>>>>>>>>>>>     apparently were not published on gateway.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     I used command line
>>>>>>>>>>>>>>>>>>>     esgpublish --update --keep-version 
>>>>>>>>>>>>>>>>>>> --map<map_file>    --project cmip5
>>>>>>>>>>>>>>>>>>>     --noscan
>>>>>>>>>>>>>>>>>>>     --publish.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     Should map file be of some specific format to 
>>>>>>>>>>>>>>>>>>> make it works in mode I
>>>>>>>>>>>>>>>>>>>     need?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>>>>>>>     Sergey Nikonov
>>>>>>>>>>>>>>>>>>>     GFDL
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>> http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>>     _______________________________________________
>>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>>>     --
>>>>>>>>>     Estanislao Gonzalez
>>>>>>>>>
>>>>>>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>>>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate 
>>>>>>>>> Computing Centre
>>>>>>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>>>>>>
>>>>>>>>>     Phone:   +49 (40) 46 00 94-126
>>>>>>>>>     E-Mail:gonzalez at dkrz.de
>>>>>>>>>
>>>>>>>>>     _______________________________________________
>>>>>>>>>     GO-ESSP-TECH mailing list
>>>>>>>>>     GO-ESSP-TECH at ucar.edu
>>>>>>>>>     http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>>>>>>
>>>>>>>
>>>>>>>     --
>>>>>>>     Estanislao Gonzalez
>>>>>>>
>>>>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate 
>>>>>>> Computing Centre
>>>>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>>>>
>>>>>>>     Phone:   +49 (40) 46 00 94-126
>>>>>>>     E-Mail:gonzalez at dkrz.de
>>>>>
>>>>>
>>>>>     --
>>>>>     Estanislao Gonzalez
>>>>>
>>>>>     Max-Planck-Institut für Meteorologie (MPI-M)
>>>>>     Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing 
>>>>> Centre
>>>>>     Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>>
>>>>>     Phone:   +49 (40) 46 00 94-126
>>>>>     E-Mail:gonzalez at dkrz.de
>>>>>
>>>>
>>>>
>>>>  --
>>>>  Estanislao Gonzalez
>>>>
>>>>  Max-Planck-Institut für Meteorologie (MPI-M)
>>>>  Deutsches Klimarechenzentrum (DKRZ) - German Climate Computing Centre
>>>>  Room 108 - Bundesstrasse 45a, D-20146 Hamburg, Germany
>>>>
>>>>  Phone:   +49 (40) 46 00 94-126
>>>>  E-Mail:gonzalez at dkrz.de
>>>
>>>
>>>  _______________________________________________
>>>  GO-ESSP-TECH mailing list
>>>  GO-ESSP-TECH at ucar.edu
>>>  http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>>  _______________________________________________
>>  GO-ESSP-TECH mailing list
>>  GO-ESSP-TECH at ucar.edu
>>  http://mailman.ucar.edu/mailman/listinfo/go-essp-tech
>
>
>


-- 
------------------ DKRZ / Data Management ------------------
Martina Stockhause	
Deutsches Klimarechenzentrum	phone:	+49-40-460094-122
Bundesstr. 45a			FAX:	+49-40-460094-106
D-20146 Hamburg, Germany	e-mail:	stockhause at dkrz.de
------------------------------------------------------------



More information about the GO-ESSP-TECH mailing list