[Go-essp-tech] noaa node is not working...

Serguei Nikonov serguei.nikonov at noaa.gov
Wed Feb 29 11:22:15 MST 2012


Hi Luca,

is any good news about data node release? We are looking forward it cause GFDL 
data node practically nonfunctional having rate of successful requests ~1% (I 
wrote it recently on go-essp-tech list). Keeping in mind that right now is very 
hot time for users - all wants to get data before Hawaii meeting we are very 
anxious about this problem.

Thanks,
Sergey


On 02/23/2012 05:13 PM, Luca Cinquini wrote:
> Hi Serguei,
> sorry I haven't replied to this so far... I think we should wait to tackle this
> till next week, when you can install the new release of the data node. At that
> point, we'll know for sure what software you are running, and there should be
> enough debug statements in the logs to figure out what's wrong.
> So please be patient, and bug as again by mid week if you haven't heard from us.
> thanks, Luca
>
> On Thu, Feb 23, 2012 at 7:45 AM, Serguei Nikonov <serguei.nikonov at noaa.gov
> <mailto:serguei.nikonov at noaa.gov>> wrote:
>
>     Hi Estani,
>
>     we increased memory allocation 2 months ago. Unfortunately the main issue we
>     had, 403 error, is still here.
>
>     Sergey
>
>
>     On 02/23/2012 02:46 AM, Estanislao Gonzalez wrote:
>
>         Hi Hans,
>
>         I was about to suggest using /usr/local/java :-)
>
>         Don't worry about the wms config error... I have that too... As we don't
>         use it,
>         there's no real harm, though indeed it would great to set it up properly.
>
>           From your mail I'd think you are ready to go. I haven't completely
>         understand
>         if you have increased the memory allocation or was already at those
>         values you
>         show.
>         If the former s true, then the memory and open file increase should
>         solve all
>         the problems you were having.
>
>         Cheers,
>         Estani
>
>         Am 23.02.2012 03:31, schrieb Hans Vahlenkamp:
>
>             Some more information... I found why running jmap was failing. We have
>             another java version installed; the one provided by Red Hat with
>             RHEL 5 which
>             perhaps should be removed. Using jmap from the Java version provided
>             with
>             the data node software works.
>
>             [root at data2 ~]# sudo /usr/local/java/bin/jmap -heap $(sudo jps |
>             grep -iv jps)
>             Attaching to process ID 28509, please wait...
>             Debugger attached successfully.
>             Server compiler detected.
>             JVM version is 19.0-b09
>
>             using thread-local object allocation.
>             Parallel GC with 10 thread(s)
>
>             Heap Configuration:
>             MinHeapFreeRatio = 40
>             MaxHeapFreeRatio = 70
>             MaxHeapSize = 17179869184 <tel:17179869184> (16384.0MB)
>             NewSize = 1310720 (1.25MB)
>             MaxNewSize = 17592186044415 MB
>             OldSize = 5439488 (5.1875MB)
>             NewRatio = 2
>             SurvivorRatio = 8
>             PermSize = 21757952 (20.75MB)
>             MaxPermSize = 536870912 (512.0MB)
>
>             Heap Usage:
>             PS Young Generation
>             Eden Space:
>             capacity = 4295032832 (4096.0625MB)
>             used = 3749511000 (3575.812339782715MB)
>             free = 545521832 (520.2501602172852MB)
>             87.29877387815041% used
>              From Space:
>             capacity = 715784192 (682.625MB)
>             used = 715764432 (682.6061553955078MB)
>             free = 19760 (0.0188446044921875MB)
>             99.99723939139466% used
>             To Space:
>             capacity = 715784192 (682.625MB)
>             used = 0 (0.0MB)
>             free = 715784192 (682.625MB)
>             0.0% used
>             PS Old Generation
>             capacity = 11453267968 (10922.6875MB)
>             used = 4231302192 (4035.284225463867MB)
>             free = 7221965776 (6887.403274536133MB)
>             36.94406001694974% used
>             PS Perm Generation
>             capacity = 85262336 (81.3125MB)
>             used = 85221144 (81.2732162475586MB)
>             free = 41192 (0.03928375244140625MB)
>             99.95168792935722% used
>
>             However, we are still getting frequent "HTTP Status 403 - Access
>             Denied."
>             failures when trying to download files directly from our local TDS.
>
>             Hans
>
>
>             On 02/22/2012 08:21 PM, Hans Vahlenkamp wrote:
>
>                 Hello Estani,
>
>                 After restarting our data node, the ORP address
>                 "https://esgdata.gfdl.noaa.__gov/OpenidRelyingParty/home.__htm
>                 <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>"
>                 is functioning again. Trying to see the memory map of the Java
>                 process is
>                 currently failing:
>
>                 [root at data2 bin]# sudo jmap -heap $(sudo jps | grep -iv jps)
>                 Attaching to process ID 28509, please wait...
>                 Exception in thread "main"
>                 java.lang.reflect.__InvocationTargetException
>                 at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>                 at
>                 sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>                 at
>                 sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
>                 at java.lang.reflect.Method.__invoke(Method.java:597)
>                 at sun.tools.jmap.JMap.runTool(__JMap.java:179)
>                 at sun.tools.jmap.JMap.main(JMap.__java:110)
>                 Caused by: sun.jvm.hotspot.runtime.__VMVersionMismatchException:
>                 Supported
>                 versions are 19.1-b02. Target VM is 19.0-b09
>                 at sun.jvm.hotspot.runtime.VM.__checkVMVersion(VM.java:224)
>                 at sun.jvm.hotspot.runtime.VM.<__init>(VM.java:287)
>                 at sun.jvm.hotspot.runtime.VM.__initialize(VM.java:357)
>                 at
>                 sun.jvm.hotspot.bugspot.__BugSpotAgent.setupVM(__BugSpotAgent.java:594)
>                 at
>                 sun.jvm.hotspot.bugspot.__BugSpotAgent.go(BugSpotAgent.__java:494)
>                 at
>                 sun.jvm.hotspot.bugspot.__BugSpotAgent.attach(__BugSpotAgent.java:332)
>                 at sun.jvm.hotspot.tools.Tool.__start(Tool.java:163)
>                 at sun.jvm.hotspot.tools.__HeapSummary.main(HeapSummary.__java:39)
>                 ... 6 more
>
>                 although I recall that it worked previously.
>
>                 I'm not sure if this is a related problem, but after a restart
>                 we noticed in
>                 the "catalina.err" file these entries:
>
>                 SEVERE: StandardWrapper.Throwable
>                 org.springframework.beans.__factory.BeanCreationException: Error
>                 creating bean
>                 with name 'wmsController' defined in ServletContext resource
>                 [/WEB-INF/wms-servlet.xml]: Invocation of init method failed; nested
>                 exception is thredds.server.wms.config.__WmsConfigException:
>                 Could not find
>                 wmsConfig.xml
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1336)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.doCreateBean(__AbstractAutowireCapableBeanFac__tory.java:471)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory$1.run(__AbstractAutowireCapableBeanFac__tory.java:409)
>
>                 at java.security.__AccessController.doPrivileged(__Native Method)
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.createBean(__AbstractAutowireCapableBeanFac__tory.java:380)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractBeanFactory$1.__getObject(AbstractBeanFactory.__java:264)
>
>                 at
>                 org.springframework.beans.__factory.support.__DefaultSingletonBeanRegistry.__getSingleton(__DefaultSingletonBeanRegistry.__java:220)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractBeanFactory.doGetBean(__AbstractBeanFactory.java:261)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:185)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:164)
>
>                 at
>                 org.springframework.beans.__factory.support.__DefaultListableBeanFactory.__preInstantiateSingletons(__DefaultListableBeanFactory.__java:429)
>
>                 at
>                 org.springframework.context.__support.__AbstractApplicationContext.__finishBeanFactoryInitializatio__n(AbstractApplicationContext.__java:729)
>
>                 at
>                 org.springframework.context.__support.__AbstractApplicationContext.__refresh(__AbstractApplicationContext.__java:381)
>
>                 at
>                 org.springframework.web.__servlet.FrameworkServlet.__createWebApplicationContext(__FrameworkServlet.java:402)
>
>                 at
>                 org.springframework.web.__servlet.FrameworkServlet.__initWebApplicationContext(__FrameworkServlet.java:316)
>
>                 at
>                 org.springframework.web.__servlet.FrameworkServlet.__initServletBean(__FrameworkServlet.java:282)
>
>                 at
>                 org.springframework.web.__servlet.HttpServletBean.init(__HttpServletBean.java:126)
>                 at javax.servlet.GenericServlet.__init(GenericServlet.java:212)
>                 at
>                 org.apache.catalina.core.__StandardWrapper.loadServlet(__StandardWrapper.java:1173)
>                 at
>                 org.apache.catalina.core.__StandardWrapper.load(__StandardWrapper.java:993)
>                 at
>                 org.apache.catalina.core.__StandardContext.loadOnStartup(__StandardContext.java:4420)
>
>                 at
>                 org.apache.catalina.core.__StandardContext.start(__StandardContext.java:4733)
>                 at
>                 org.apache.catalina.core.__ContainerBase.__addChildInternal(__ContainerBase.java:799)
>                 at
>                 org.apache.catalina.core.__ContainerBase.addChild(__ContainerBase.java:779)
>                 at
>                 org.apache.catalina.core.__StandardHost.addChild(__StandardHost.java:601)
>                 at
>                 org.apache.catalina.startup.__HostConfig.deployDescriptor(__HostConfig.java:675)
>                 at
>                 org.apache.catalina.startup.__HostConfig.deployDescriptors(__HostConfig.java:601)
>                 at
>                 org.apache.catalina.startup.__HostConfig.deployApps(__HostConfig.java:502)
>                 at
>                 org.apache.catalina.startup.__HostConfig.start(HostConfig.__java:1315)
>                 at
>                 org.apache.catalina.startup.__HostConfig.lifecycleEvent(__HostConfig.java:324)
>                 at
>                 org.apache.catalina.util.__LifecycleSupport.__fireLifecycleEvent(__LifecycleSupport.java:142)
>
>                 at
>                 org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1061)
>                 at
>                 org.apache.catalina.core.__StandardHost.start(__StandardHost.java:840)
>                 at
>                 org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1053)
>                 at
>                 org.apache.catalina.core.__StandardEngine.start(__StandardEngine.java:463)
>                 at
>                 org.apache.catalina.core.__StandardService.start(__StandardService.java:525)
>                 at
>                 org.apache.catalina.core.__StandardServer.start(__StandardServer.java:754)
>                 at org.apache.catalina.startup.__Catalina.start(Catalina.java:__595)
>                 at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>                 at
>                 sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>                 at
>                 sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
>                 at java.lang.reflect.Method.__invoke(Method.java:597)
>                 at
>                 org.apache.catalina.startup.__Bootstrap.start(Bootstrap.__java:289)
>                 at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>                 at
>                 sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>                 at
>                 sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
>                 at java.lang.reflect.Method.__invoke(Method.java:597)
>                 at
>                 org.apache.commons.daemon.__support.DaemonLoader.start(__DaemonLoader.java:219)
>                 Caused by: thredds.server.wms.config.__WmsConfigException: Could
>                 not find
>                 wmsConfig.xml
>                 at
>                 thredds.server.wms.__ThreddsWmsController.init(__ThreddsWmsController.java:99)
>                 at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>                 at
>                 sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>                 at
>                 sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
>                 at java.lang.reflect.Method.__invoke(Method.java:597)
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeCustomInitMethod(__AbstractAutowireCapableBeanFac__tory.java:1412)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeInitMethods(__AbstractAutowireCapableBeanFac__tory.java:1373)
>
>                 at
>                 org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1333)
>
>                 ... 47 more
>
>                 I'm not sure why this is occurring since the
>                 "/usr/local/apache-tomcat-6.0.__32/content/thredds/wmsConfig.__xml"
>                 file exists.
>
>                 We increased the maximum number of open files to 4096 for the
>                 tomcat user and
>                 have the Java memory options set with "-Xmx16384m -Xms16384m
>                 -XX:MaxPermSize=512m".
>                 Also, the last software update we did on the data node was in
>                 December.
>
>                 Thanks,
>
>                 Hans and Sergey
>
>
>                 On 02/22/2012 10:00 AM, Estanislao Gonzalez wrote:
>
>                     Hi Sergei,
>
>                     I have a terrible memory so don't ask for the impossible ;-)
>
>                     I guess there are multiple things happening at the same
>                     time, which is
>                     pretty standard in CMIP5 context...
>
>                     You do have a problem that has nothing to do with pcmdi3
>                     being overloaded,
>                     which it is and causes many other problems.
>
>                     I've followed the link you sent and got a 500 error, which
>                     is not near good...
>
>                     java.lang.__NoClassDefFoundError:
>                     org/springframework/web/util/__UriUtils
>                     org.springframework.web.util.__UrlPathHelper.__decodeRequestString(__UrlPathHelper.java:307)
>
>                     org.springframework.web.util.__UrlPathHelper.getContextPath(__UrlPathHelper.java:213)
>
>                     org.springframework.web.util.__UrlPathHelper.__getPathWithinApplication(__UrlPathHelper.java:163)
>
>
>                     This was thrown by the ORP... (see
>                     https://esgdata.gfdl.noaa.gov/__OpenidRelyingParty/home.htm
>                     <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>)
>                     Maybe Luca can help with this... but tell me something, have
>                     you updated or
>                     changed the node in any particular way?
>                     There are multiple possible causes... one being you ran out
>                     of memory in the
>                     PermGen space, and the class could just not be loaded anymore...
>                     Could you check the catalina logs? If they are huge, stop
>                     the node move the
>                     catalogs somewhere /tag them with a timestamp in the name)
>                     and let it create
>                     new ones, that will make debugging easier.
>
>                     There should be a message telling why the ORP is not working
>                     anymore...
>
>                     Also could you send me this:
>                     #information on tomcat, provided it's the only java server
>                     in the node:
>                     sudo jmap -heap $(sudo jps | grep -iv jps)
>                     # current java parameters
>                     grep Xmx /etc/esg.env
>
>                     This is what I have to compare:
>                     $ grep Xmx /etc/esg.env
>                     export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m
>                     -XX:NewRatio=9"
>                     $ sudo jmap -heap $(sudo jps | grep -iv jps)
>                     Attaching to process ID 15222, please wait...
>                     Debugger attached successfully.
>                     Server compiler detected.
>                     JVM version is 19.0-b09
>
>                     using thread-local object allocation.
>                     Parallel GC with 8 thread(s)
>
>                     Heap Configuration:
>                     MinHeapFreeRatio = 40
>                     MaxHeapFreeRatio = 70
>                     MaxHeapSize = 16106127360 <tel:16106127360> (15360.0MB)
>                     NewSize = 1310720 (1.25MB)
>                     MaxNewSize = 17592186044415 MB
>                     OldSize = 5439488 (5.1875MB)
>                     NewRatio = 9
>                     SurvivorRatio = 8
>                     PermSize = 21757952 (20.75MB)
>                     MaxPermSize = 536870912 (512.0MB)
>
>                     Heap Usage:
>                     PS Young Generation
>                     Eden Space:
>                     capacity = 757792768 (722.6875MB)
>                     used = 332487616 (317.08489990234375MB)
>                     free = 425305152 (405.60260009765625MB)
>                     43.875796924997836% used
>                      From Space:
>                     capacity = 152043520 (145.0MB)
>                     used = 82383968 (78.56747436523438MB)
>                     free = 69659552 (66.43252563476562MB)
>                     54.184465079471984% used
>                     To Space:
>                     capacity = 157483008 (150.1875MB)
>                     used = 0 (0.0MB)
>                     free = 157483008 (150.1875MB)
>                     0.0% used
>                     PS Old Generation
>                     capacity = 9663676416 (9216.0MB)
>                     used = 7712867544 (7355.563682556152MB)
>                     free = 1950808872 (1860.4363174438477MB)
>                     79.81297398606937% used
>                     PS Perm Generation
>                     capacity = 79757312 (76.0625MB)
>                     used = 79734640 (76.04087829589844MB)
>                     free = 22672 (0.0216217041015625MB)
>                     99.97157376617707% used
>
>                     Thanks,
>                     Estani
>
>                     Am 22.02.2012 15:35, schrieb Serguei Nikonov:
>
>                         Hi Estani,
>
>                         Indeed we experienced two last days "Too many open
>                         files" error in tomcat.
>                         That made us to clear up catalina log file which was so
>                         big that overload
>                         hard drive. Thanks for your advise how prevent this
>                         issue. Currently, TDS
>                         is up but has a limited functionality. Very often (may
>                         be most time) files
>                         do not have access - TDS gives "403 - Access Denied"
>                         error when trying to
>                         download them. We had this issue 3 months ago. You have
>                         to remember it
>                         because you were the first who pointed out to this
>                         problem. It's
>                         interesting that at the same time files are downloadable
>                         from pcmdi gateway
>                         but not
>                         directly from data node TDS. Bob fixed it that time on
>                         pcmdi side. Now we
>                         have very similar symptoms but he is sure that this is
>                         because gateway is
>                         too busy.
>
>                         Just for example:
>                         file in this dataset
>                         http://esgdata.gfdl.noaa.gov/__thredds/esgcet/1/cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.html?dataset=cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.ccb_Amon_GFDL-CM3___1pctCO2_r1i1p1_002601-003012.__nc
>                         <http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.html?dataset=cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.ccb_Amon_GFDL-CM3_1pctCO2_r1i1p1_002601-003012.nc>
>
>                         was accessible when I try before writing this email and
>                         now is not.
>
>                         At the same time it's downloadable from pcmdi gateway
>                         all time.
>
>                         regards,
>                         Sergey
>
>
>                         On 02/22/2012 05:41 AM, Estanislao Gonzalez wrote:
>
>                             Hi Sergei,
>
>                             Your node is down because of too many files:
>                             http://esgdata.gfdl.noaa.gov/__esgf-node-manager/
>                             <http://esgdata.gfdl.noaa.gov/esgf-node-manager/>
>
>                             java.io.FileNotFoundException:
>                             /usr/local/apache-tomcat-6.0.__32/webapps/esgf-node-manager/__index.html
>                             (Too
>                             many
>                             open files)
>                             java.io.FileInputStream.open(__Native Method)
>                             java.io.FileInputStream.<init>__(FileInputStream.java:106)
>                             org.apache.naming.resources.__FileDirContext$FileResource.__streamContent(FileDirContext.__java:927)
>
>
>                             org.apache.catalina.servlets.__DefaultServlet.copy(__DefaultServlet.java:1832)
>                             org.apache.catalina.servlets.__DefaultServlet.serveResource(__DefaultServlet.java:919)
>
>                             org.apache.catalina.servlets.__DefaultServlet.doGet(__DefaultServlet.java:398)
>                             javax.servlet.http.__HttpServlet.service(__HttpServlet.java:617)
>                             javax.servlet.http.__HttpServlet.service(__HttpServlet.java:717)
>
>                             The catalina logs should also have messages
>                             regarding this.
>                             This is how you can prevent it from happening again:
>                             http://esgf.org/wiki/ESGFNode/__FAQ#Tomcat_is_complaining___about_too_many_open_files
>                             <http://esgf.org/wiki/ESGFNode/FAQ#Tomcat_is_complaining_about_too_many_open_files>
>
>                             Now, you'll have to restart the node.
>
>                             Thanks,
>                             Estani
>
>
>
>
>
>
>



More information about the GO-ESSP-TECH mailing list