[Go-essp-tech] noaa node is not working...
Serguei Nikonov
serguei.nikonov at noaa.gov
Wed Feb 29 11:22:15 MST 2012
Hi Luca,
is any good news about data node release? We are looking forward it cause GFDL
data node practically nonfunctional having rate of successful requests ~1% (I
wrote it recently on go-essp-tech list). Keeping in mind that right now is very
hot time for users - all wants to get data before Hawaii meeting we are very
anxious about this problem.
Thanks,
Sergey
On 02/23/2012 05:13 PM, Luca Cinquini wrote:
> Hi Serguei,
> sorry I haven't replied to this so far... I think we should wait to tackle this
> till next week, when you can install the new release of the data node. At that
> point, we'll know for sure what software you are running, and there should be
> enough debug statements in the logs to figure out what's wrong.
> So please be patient, and bug as again by mid week if you haven't heard from us.
> thanks, Luca
>
> On Thu, Feb 23, 2012 at 7:45 AM, Serguei Nikonov <serguei.nikonov at noaa.gov
> <mailto:serguei.nikonov at noaa.gov>> wrote:
>
> Hi Estani,
>
> we increased memory allocation 2 months ago. Unfortunately the main issue we
> had, 403 error, is still here.
>
> Sergey
>
>
> On 02/23/2012 02:46 AM, Estanislao Gonzalez wrote:
>
> Hi Hans,
>
> I was about to suggest using /usr/local/java :-)
>
> Don't worry about the wms config error... I have that too... As we don't
> use it,
> there's no real harm, though indeed it would great to set it up properly.
>
> From your mail I'd think you are ready to go. I haven't completely
> understand
> if you have increased the memory allocation or was already at those
> values you
> show.
> If the former s true, then the memory and open file increase should
> solve all
> the problems you were having.
>
> Cheers,
> Estani
>
> Am 23.02.2012 03:31, schrieb Hans Vahlenkamp:
>
> Some more information... I found why running jmap was failing. We have
> another java version installed; the one provided by Red Hat with
> RHEL 5 which
> perhaps should be removed. Using jmap from the Java version provided
> with
> the data node software works.
>
> [root at data2 ~]# sudo /usr/local/java/bin/jmap -heap $(sudo jps |
> grep -iv jps)
> Attaching to process ID 28509, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 19.0-b09
>
> using thread-local object allocation.
> Parallel GC with 10 thread(s)
>
> Heap Configuration:
> MinHeapFreeRatio = 40
> MaxHeapFreeRatio = 70
> MaxHeapSize = 17179869184 <tel:17179869184> (16384.0MB)
> NewSize = 1310720 (1.25MB)
> MaxNewSize = 17592186044415 MB
> OldSize = 5439488 (5.1875MB)
> NewRatio = 2
> SurvivorRatio = 8
> PermSize = 21757952 (20.75MB)
> MaxPermSize = 536870912 (512.0MB)
>
> Heap Usage:
> PS Young Generation
> Eden Space:
> capacity = 4295032832 (4096.0625MB)
> used = 3749511000 (3575.812339782715MB)
> free = 545521832 (520.2501602172852MB)
> 87.29877387815041% used
> From Space:
> capacity = 715784192 (682.625MB)
> used = 715764432 (682.6061553955078MB)
> free = 19760 (0.0188446044921875MB)
> 99.99723939139466% used
> To Space:
> capacity = 715784192 (682.625MB)
> used = 0 (0.0MB)
> free = 715784192 (682.625MB)
> 0.0% used
> PS Old Generation
> capacity = 11453267968 (10922.6875MB)
> used = 4231302192 (4035.284225463867MB)
> free = 7221965776 (6887.403274536133MB)
> 36.94406001694974% used
> PS Perm Generation
> capacity = 85262336 (81.3125MB)
> used = 85221144 (81.2732162475586MB)
> free = 41192 (0.03928375244140625MB)
> 99.95168792935722% used
>
> However, we are still getting frequent "HTTP Status 403 - Access
> Denied."
> failures when trying to download files directly from our local TDS.
>
> Hans
>
>
> On 02/22/2012 08:21 PM, Hans Vahlenkamp wrote:
>
> Hello Estani,
>
> After restarting our data node, the ORP address
> "https://esgdata.gfdl.noaa.__gov/OpenidRelyingParty/home.__htm
> <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>"
> is functioning again. Trying to see the memory map of the Java
> process is
> currently failing:
>
> [root at data2 bin]# sudo jmap -heap $(sudo jps | grep -iv jps)
> Attaching to process ID 28509, please wait...
> Exception in thread "main"
> java.lang.reflect.__InvocationTargetException
> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
> at
> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
> at
> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
> at java.lang.reflect.Method.__invoke(Method.java:597)
> at sun.tools.jmap.JMap.runTool(__JMap.java:179)
> at sun.tools.jmap.JMap.main(JMap.__java:110)
> Caused by: sun.jvm.hotspot.runtime.__VMVersionMismatchException:
> Supported
> versions are 19.1-b02. Target VM is 19.0-b09
> at sun.jvm.hotspot.runtime.VM.__checkVMVersion(VM.java:224)
> at sun.jvm.hotspot.runtime.VM.<__init>(VM.java:287)
> at sun.jvm.hotspot.runtime.VM.__initialize(VM.java:357)
> at
> sun.jvm.hotspot.bugspot.__BugSpotAgent.setupVM(__BugSpotAgent.java:594)
> at
> sun.jvm.hotspot.bugspot.__BugSpotAgent.go(BugSpotAgent.__java:494)
> at
> sun.jvm.hotspot.bugspot.__BugSpotAgent.attach(__BugSpotAgent.java:332)
> at sun.jvm.hotspot.tools.Tool.__start(Tool.java:163)
> at sun.jvm.hotspot.tools.__HeapSummary.main(HeapSummary.__java:39)
> ... 6 more
>
> although I recall that it worked previously.
>
> I'm not sure if this is a related problem, but after a restart
> we noticed in
> the "catalina.err" file these entries:
>
> SEVERE: StandardWrapper.Throwable
> org.springframework.beans.__factory.BeanCreationException: Error
> creating bean
> with name 'wmsController' defined in ServletContext resource
> [/WEB-INF/wms-servlet.xml]: Invocation of init method failed; nested
> exception is thredds.server.wms.config.__WmsConfigException:
> Could not find
> wmsConfig.xml
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1336)
>
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.doCreateBean(__AbstractAutowireCapableBeanFac__tory.java:471)
>
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory$1.run(__AbstractAutowireCapableBeanFac__tory.java:409)
>
> at java.security.__AccessController.doPrivileged(__Native Method)
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.createBean(__AbstractAutowireCapableBeanFac__tory.java:380)
>
> at
> org.springframework.beans.__factory.support.__AbstractBeanFactory$1.__getObject(AbstractBeanFactory.__java:264)
>
> at
> org.springframework.beans.__factory.support.__DefaultSingletonBeanRegistry.__getSingleton(__DefaultSingletonBeanRegistry.__java:220)
>
> at
> org.springframework.beans.__factory.support.__AbstractBeanFactory.doGetBean(__AbstractBeanFactory.java:261)
>
> at
> org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:185)
>
> at
> org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:164)
>
> at
> org.springframework.beans.__factory.support.__DefaultListableBeanFactory.__preInstantiateSingletons(__DefaultListableBeanFactory.__java:429)
>
> at
> org.springframework.context.__support.__AbstractApplicationContext.__finishBeanFactoryInitializatio__n(AbstractApplicationContext.__java:729)
>
> at
> org.springframework.context.__support.__AbstractApplicationContext.__refresh(__AbstractApplicationContext.__java:381)
>
> at
> org.springframework.web.__servlet.FrameworkServlet.__createWebApplicationContext(__FrameworkServlet.java:402)
>
> at
> org.springframework.web.__servlet.FrameworkServlet.__initWebApplicationContext(__FrameworkServlet.java:316)
>
> at
> org.springframework.web.__servlet.FrameworkServlet.__initServletBean(__FrameworkServlet.java:282)
>
> at
> org.springframework.web.__servlet.HttpServletBean.init(__HttpServletBean.java:126)
> at javax.servlet.GenericServlet.__init(GenericServlet.java:212)
> at
> org.apache.catalina.core.__StandardWrapper.loadServlet(__StandardWrapper.java:1173)
> at
> org.apache.catalina.core.__StandardWrapper.load(__StandardWrapper.java:993)
> at
> org.apache.catalina.core.__StandardContext.loadOnStartup(__StandardContext.java:4420)
>
> at
> org.apache.catalina.core.__StandardContext.start(__StandardContext.java:4733)
> at
> org.apache.catalina.core.__ContainerBase.__addChildInternal(__ContainerBase.java:799)
> at
> org.apache.catalina.core.__ContainerBase.addChild(__ContainerBase.java:779)
> at
> org.apache.catalina.core.__StandardHost.addChild(__StandardHost.java:601)
> at
> org.apache.catalina.startup.__HostConfig.deployDescriptor(__HostConfig.java:675)
> at
> org.apache.catalina.startup.__HostConfig.deployDescriptors(__HostConfig.java:601)
> at
> org.apache.catalina.startup.__HostConfig.deployApps(__HostConfig.java:502)
> at
> org.apache.catalina.startup.__HostConfig.start(HostConfig.__java:1315)
> at
> org.apache.catalina.startup.__HostConfig.lifecycleEvent(__HostConfig.java:324)
> at
> org.apache.catalina.util.__LifecycleSupport.__fireLifecycleEvent(__LifecycleSupport.java:142)
>
> at
> org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1061)
> at
> org.apache.catalina.core.__StandardHost.start(__StandardHost.java:840)
> at
> org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1053)
> at
> org.apache.catalina.core.__StandardEngine.start(__StandardEngine.java:463)
> at
> org.apache.catalina.core.__StandardService.start(__StandardService.java:525)
> at
> org.apache.catalina.core.__StandardServer.start(__StandardServer.java:754)
> at org.apache.catalina.startup.__Catalina.start(Catalina.java:__595)
> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
> at
> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
> at
> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
> at java.lang.reflect.Method.__invoke(Method.java:597)
> at
> org.apache.catalina.startup.__Bootstrap.start(Bootstrap.__java:289)
> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
> at
> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
> at
> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
> at java.lang.reflect.Method.__invoke(Method.java:597)
> at
> org.apache.commons.daemon.__support.DaemonLoader.start(__DaemonLoader.java:219)
> Caused by: thredds.server.wms.config.__WmsConfigException: Could
> not find
> wmsConfig.xml
> at
> thredds.server.wms.__ThreddsWmsController.init(__ThreddsWmsController.java:99)
> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
> at
> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
> at
> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>
> at java.lang.reflect.Method.__invoke(Method.java:597)
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeCustomInitMethod(__AbstractAutowireCapableBeanFac__tory.java:1412)
>
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeInitMethods(__AbstractAutowireCapableBeanFac__tory.java:1373)
>
> at
> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1333)
>
> ... 47 more
>
> I'm not sure why this is occurring since the
> "/usr/local/apache-tomcat-6.0.__32/content/thredds/wmsConfig.__xml"
> file exists.
>
> We increased the maximum number of open files to 4096 for the
> tomcat user and
> have the Java memory options set with "-Xmx16384m -Xms16384m
> -XX:MaxPermSize=512m".
> Also, the last software update we did on the data node was in
> December.
>
> Thanks,
>
> Hans and Sergey
>
>
> On 02/22/2012 10:00 AM, Estanislao Gonzalez wrote:
>
> Hi Sergei,
>
> I have a terrible memory so don't ask for the impossible ;-)
>
> I guess there are multiple things happening at the same
> time, which is
> pretty standard in CMIP5 context...
>
> You do have a problem that has nothing to do with pcmdi3
> being overloaded,
> which it is and causes many other problems.
>
> I've followed the link you sent and got a 500 error, which
> is not near good...
>
> java.lang.__NoClassDefFoundError:
> org/springframework/web/util/__UriUtils
> org.springframework.web.util.__UrlPathHelper.__decodeRequestString(__UrlPathHelper.java:307)
>
> org.springframework.web.util.__UrlPathHelper.getContextPath(__UrlPathHelper.java:213)
>
> org.springframework.web.util.__UrlPathHelper.__getPathWithinApplication(__UrlPathHelper.java:163)
>
>
> This was thrown by the ORP... (see
> https://esgdata.gfdl.noaa.gov/__OpenidRelyingParty/home.htm
> <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>)
> Maybe Luca can help with this... but tell me something, have
> you updated or
> changed the node in any particular way?
> There are multiple possible causes... one being you ran out
> of memory in the
> PermGen space, and the class could just not be loaded anymore...
> Could you check the catalina logs? If they are huge, stop
> the node move the
> catalogs somewhere /tag them with a timestamp in the name)
> and let it create
> new ones, that will make debugging easier.
>
> There should be a message telling why the ORP is not working
> anymore...
>
> Also could you send me this:
> #information on tomcat, provided it's the only java server
> in the node:
> sudo jmap -heap $(sudo jps | grep -iv jps)
> # current java parameters
> grep Xmx /etc/esg.env
>
> This is what I have to compare:
> $ grep Xmx /etc/esg.env
> export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m
> -XX:NewRatio=9"
> $ sudo jmap -heap $(sudo jps | grep -iv jps)
> Attaching to process ID 15222, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 19.0-b09
>
> using thread-local object allocation.
> Parallel GC with 8 thread(s)
>
> Heap Configuration:
> MinHeapFreeRatio = 40
> MaxHeapFreeRatio = 70
> MaxHeapSize = 16106127360 <tel:16106127360> (15360.0MB)
> NewSize = 1310720 (1.25MB)
> MaxNewSize = 17592186044415 MB
> OldSize = 5439488 (5.1875MB)
> NewRatio = 9
> SurvivorRatio = 8
> PermSize = 21757952 (20.75MB)
> MaxPermSize = 536870912 (512.0MB)
>
> Heap Usage:
> PS Young Generation
> Eden Space:
> capacity = 757792768 (722.6875MB)
> used = 332487616 (317.08489990234375MB)
> free = 425305152 (405.60260009765625MB)
> 43.875796924997836% used
> From Space:
> capacity = 152043520 (145.0MB)
> used = 82383968 (78.56747436523438MB)
> free = 69659552 (66.43252563476562MB)
> 54.184465079471984% used
> To Space:
> capacity = 157483008 (150.1875MB)
> used = 0 (0.0MB)
> free = 157483008 (150.1875MB)
> 0.0% used
> PS Old Generation
> capacity = 9663676416 (9216.0MB)
> used = 7712867544 (7355.563682556152MB)
> free = 1950808872 (1860.4363174438477MB)
> 79.81297398606937% used
> PS Perm Generation
> capacity = 79757312 (76.0625MB)
> used = 79734640 (76.04087829589844MB)
> free = 22672 (0.0216217041015625MB)
> 99.97157376617707% used
>
> Thanks,
> Estani
>
> Am 22.02.2012 15:35, schrieb Serguei Nikonov:
>
> Hi Estani,
>
> Indeed we experienced two last days "Too many open
> files" error in tomcat.
> That made us to clear up catalina log file which was so
> big that overload
> hard drive. Thanks for your advise how prevent this
> issue. Currently, TDS
> is up but has a limited functionality. Very often (may
> be most time) files
> do not have access - TDS gives "403 - Access Denied"
> error when trying to
> download them. We had this issue 3 months ago. You have
> to remember it
> because you were the first who pointed out to this
> problem. It's
> interesting that at the same time files are downloadable
> from pcmdi gateway
> but not
> directly from data node TDS. Bob fixed it that time on
> pcmdi side. Now we
> have very similar symptoms but he is sure that this is
> because gateway is
> too busy.
>
> Just for example:
> file in this dataset
> http://esgdata.gfdl.noaa.gov/__thredds/esgcet/1/cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.html?dataset=cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.ccb_Amon_GFDL-CM3___1pctCO2_r1i1p1_002601-003012.__nc
> <http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.html?dataset=cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.ccb_Amon_GFDL-CM3_1pctCO2_r1i1p1_002601-003012.nc>
>
> was accessible when I try before writing this email and
> now is not.
>
> At the same time it's downloadable from pcmdi gateway
> all time.
>
> regards,
> Sergey
>
>
> On 02/22/2012 05:41 AM, Estanislao Gonzalez wrote:
>
> Hi Sergei,
>
> Your node is down because of too many files:
> http://esgdata.gfdl.noaa.gov/__esgf-node-manager/
> <http://esgdata.gfdl.noaa.gov/esgf-node-manager/>
>
> java.io.FileNotFoundException:
> /usr/local/apache-tomcat-6.0.__32/webapps/esgf-node-manager/__index.html
> (Too
> many
> open files)
> java.io.FileInputStream.open(__Native Method)
> java.io.FileInputStream.<init>__(FileInputStream.java:106)
> org.apache.naming.resources.__FileDirContext$FileResource.__streamContent(FileDirContext.__java:927)
>
>
> org.apache.catalina.servlets.__DefaultServlet.copy(__DefaultServlet.java:1832)
> org.apache.catalina.servlets.__DefaultServlet.serveResource(__DefaultServlet.java:919)
>
> org.apache.catalina.servlets.__DefaultServlet.doGet(__DefaultServlet.java:398)
> javax.servlet.http.__HttpServlet.service(__HttpServlet.java:617)
> javax.servlet.http.__HttpServlet.service(__HttpServlet.java:717)
>
> The catalina logs should also have messages
> regarding this.
> This is how you can prevent it from happening again:
> http://esgf.org/wiki/ESGFNode/__FAQ#Tomcat_is_complaining___about_too_many_open_files
> <http://esgf.org/wiki/ESGFNode/FAQ#Tomcat_is_complaining_about_too_many_open_files>
>
> Now, you'll have to restart the node.
>
> Thanks,
> Estani
>
>
>
>
>
>
>
More information about the GO-ESSP-TECH
mailing list