[Go-essp-tech] noaa node is not working...
Cinquini, Luca (3880)
Luca.Cinquini at jpl.nasa.gov
Wed Feb 29 11:26:11 MST 2012
Hi Serguei,
I just sent an email to the esgf-devel list asking for a volunteer to install the datanode-only setup. I have re-installed it today and it works fine for me. I'll be happy to work with you on the GFDL installation of a data node, but before we start - can you remind me again what are the problems with your current datanode ? Is it an authorization problem ? Or is it something different ?
thanks, Luca
On Feb 29, 2012, at 11:22 AM, Serguei Nikonov wrote:
> Hi Luca,
>
> is any good news about data node release? We are looking forward it cause GFDL
> data node practically nonfunctional having rate of successful requests ~1% (I
> wrote it recently on go-essp-tech list). Keeping in mind that right now is very
> hot time for users - all wants to get data before Hawaii meeting we are very
> anxious about this problem.
>
> Thanks,
> Sergey
>
>
> On 02/23/2012 05:13 PM, Luca Cinquini wrote:
>> Hi Serguei,
>> sorry I haven't replied to this so far... I think we should wait to tackle this
>> till next week, when you can install the new release of the data node. At that
>> point, we'll know for sure what software you are running, and there should be
>> enough debug statements in the logs to figure out what's wrong.
>> So please be patient, and bug as again by mid week if you haven't heard from us.
>> thanks, Luca
>>
>> On Thu, Feb 23, 2012 at 7:45 AM, Serguei Nikonov <serguei.nikonov at noaa.gov
>> <mailto:serguei.nikonov at noaa.gov>> wrote:
>>
>> Hi Estani,
>>
>> we increased memory allocation 2 months ago. Unfortunately the main issue we
>> had, 403 error, is still here.
>>
>> Sergey
>>
>>
>> On 02/23/2012 02:46 AM, Estanislao Gonzalez wrote:
>>
>> Hi Hans,
>>
>> I was about to suggest using /usr/local/java :-)
>>
>> Don't worry about the wms config error... I have that too... As we don't
>> use it,
>> there's no real harm, though indeed it would great to set it up properly.
>>
>> From your mail I'd think you are ready to go. I haven't completely
>> understand
>> if you have increased the memory allocation or was already at those
>> values you
>> show.
>> If the former s true, then the memory and open file increase should
>> solve all
>> the problems you were having.
>>
>> Cheers,
>> Estani
>>
>> Am 23.02.2012 03:31, schrieb Hans Vahlenkamp:
>>
>> Some more information... I found why running jmap was failing. We have
>> another java version installed; the one provided by Red Hat with
>> RHEL 5 which
>> perhaps should be removed. Using jmap from the Java version provided
>> with
>> the data node software works.
>>
>> [root at data2 ~]# sudo /usr/local/java/bin/jmap -heap $(sudo jps |
>> grep -iv jps)
>> Attaching to process ID 28509, please wait...
>> Debugger attached successfully.
>> Server compiler detected.
>> JVM version is 19.0-b09
>>
>> using thread-local object allocation.
>> Parallel GC with 10 thread(s)
>>
>> Heap Configuration:
>> MinHeapFreeRatio = 40
>> MaxHeapFreeRatio = 70
>> MaxHeapSize = 17179869184 <tel:17179869184> (16384.0MB)
>> NewSize = 1310720 (1.25MB)
>> MaxNewSize = 17592186044415 MB
>> OldSize = 5439488 (5.1875MB)
>> NewRatio = 2
>> SurvivorRatio = 8
>> PermSize = 21757952 (20.75MB)
>> MaxPermSize = 536870912 (512.0MB)
>>
>> Heap Usage:
>> PS Young Generation
>> Eden Space:
>> capacity = 4295032832 (4096.0625MB)
>> used = 3749511000 (3575.812339782715MB)
>> free = 545521832 (520.2501602172852MB)
>> 87.29877387815041% used
>> From Space:
>> capacity = 715784192 (682.625MB)
>> used = 715764432 (682.6061553955078MB)
>> free = 19760 (0.0188446044921875MB)
>> 99.99723939139466% used
>> To Space:
>> capacity = 715784192 (682.625MB)
>> used = 0 (0.0MB)
>> free = 715784192 (682.625MB)
>> 0.0% used
>> PS Old Generation
>> capacity = 11453267968 (10922.6875MB)
>> used = 4231302192 (4035.284225463867MB)
>> free = 7221965776 (6887.403274536133MB)
>> 36.94406001694974% used
>> PS Perm Generation
>> capacity = 85262336 (81.3125MB)
>> used = 85221144 (81.2732162475586MB)
>> free = 41192 (0.03928375244140625MB)
>> 99.95168792935722% used
>>
>> However, we are still getting frequent "HTTP Status 403 - Access
>> Denied."
>> failures when trying to download files directly from our local TDS.
>>
>> Hans
>>
>>
>> On 02/22/2012 08:21 PM, Hans Vahlenkamp wrote:
>>
>> Hello Estani,
>>
>> After restarting our data node, the ORP address
>> "https://esgdata.gfdl.noaa.__gov/OpenidRelyingParty/home.__htm
>> <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>"
>> is functioning again. Trying to see the memory map of the Java
>> process is
>> currently failing:
>>
>> [root at data2 bin]# sudo jmap -heap $(sudo jps | grep -iv jps)
>> Attaching to process ID 28509, please wait...
>> Exception in thread "main"
>> java.lang.reflect.__InvocationTargetException
>> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>> at
>> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>> at
>> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>> at java.lang.reflect.Method.__invoke(Method.java:597)
>> at sun.tools.jmap.JMap.runTool(__JMap.java:179)
>> at sun.tools.jmap.JMap.main(JMap.__java:110)
>> Caused by: sun.jvm.hotspot.runtime.__VMVersionMismatchException:
>> Supported
>> versions are 19.1-b02. Target VM is 19.0-b09
>> at sun.jvm.hotspot.runtime.VM.__checkVMVersion(VM.java:224)
>> at sun.jvm.hotspot.runtime.VM.<__init>(VM.java:287)
>> at sun.jvm.hotspot.runtime.VM.__initialize(VM.java:357)
>> at
>> sun.jvm.hotspot.bugspot.__BugSpotAgent.setupVM(__BugSpotAgent.java:594)
>> at
>> sun.jvm.hotspot.bugspot.__BugSpotAgent.go(BugSpotAgent.__java:494)
>> at
>> sun.jvm.hotspot.bugspot.__BugSpotAgent.attach(__BugSpotAgent.java:332)
>> at sun.jvm.hotspot.tools.Tool.__start(Tool.java:163)
>> at sun.jvm.hotspot.tools.__HeapSummary.main(HeapSummary.__java:39)
>> ... 6 more
>>
>> although I recall that it worked previously.
>>
>> I'm not sure if this is a related problem, but after a restart
>> we noticed in
>> the "catalina.err" file these entries:
>>
>> SEVERE: StandardWrapper.Throwable
>> org.springframework.beans.__factory.BeanCreationException: Error
>> creating bean
>> with name 'wmsController' defined in ServletContext resource
>> [/WEB-INF/wms-servlet.xml]: Invocation of init method failed; nested
>> exception is thredds.server.wms.config.__WmsConfigException:
>> Could not find
>> wmsConfig.xml
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1336)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.doCreateBean(__AbstractAutowireCapableBeanFac__tory.java:471)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory$1.run(__AbstractAutowireCapableBeanFac__tory.java:409)
>>
>> at java.security.__AccessController.doPrivileged(__Native Method)
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.createBean(__AbstractAutowireCapableBeanFac__tory.java:380)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractBeanFactory$1.__getObject(AbstractBeanFactory.__java:264)
>>
>> at
>> org.springframework.beans.__factory.support.__DefaultSingletonBeanRegistry.__getSingleton(__DefaultSingletonBeanRegistry.__java:220)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractBeanFactory.doGetBean(__AbstractBeanFactory.java:261)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:185)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:164)
>>
>> at
>> org.springframework.beans.__factory.support.__DefaultListableBeanFactory.__preInstantiateSingletons(__DefaultListableBeanFactory.__java:429)
>>
>> at
>> org.springframework.context.__support.__AbstractApplicationContext.__finishBeanFactoryInitializatio__n(AbstractApplicationContext.__java:729)
>>
>> at
>> org.springframework.context.__support.__AbstractApplicationContext.__refresh(__AbstractApplicationContext.__java:381)
>>
>> at
>> org.springframework.web.__servlet.FrameworkServlet.__createWebApplicationContext(__FrameworkServlet.java:402)
>>
>> at
>> org.springframework.web.__servlet.FrameworkServlet.__initWebApplicationContext(__FrameworkServlet.java:316)
>>
>> at
>> org.springframework.web.__servlet.FrameworkServlet.__initServletBean(__FrameworkServlet.java:282)
>>
>> at
>> org.springframework.web.__servlet.HttpServletBean.init(__HttpServletBean.java:126)
>> at javax.servlet.GenericServlet.__init(GenericServlet.java:212)
>> at
>> org.apache.catalina.core.__StandardWrapper.loadServlet(__StandardWrapper.java:1173)
>> at
>> org.apache.catalina.core.__StandardWrapper.load(__StandardWrapper.java:993)
>> at
>> org.apache.catalina.core.__StandardContext.loadOnStartup(__StandardContext.java:4420)
>>
>> at
>> org.apache.catalina.core.__StandardContext.start(__StandardContext.java:4733)
>> at
>> org.apache.catalina.core.__ContainerBase.__addChildInternal(__ContainerBase.java:799)
>> at
>> org.apache.catalina.core.__ContainerBase.addChild(__ContainerBase.java:779)
>> at
>> org.apache.catalina.core.__StandardHost.addChild(__StandardHost.java:601)
>> at
>> org.apache.catalina.startup.__HostConfig.deployDescriptor(__HostConfig.java:675)
>> at
>> org.apache.catalina.startup.__HostConfig.deployDescriptors(__HostConfig.java:601)
>> at
>> org.apache.catalina.startup.__HostConfig.deployApps(__HostConfig.java:502)
>> at
>> org.apache.catalina.startup.__HostConfig.start(HostConfig.__java:1315)
>> at
>> org.apache.catalina.startup.__HostConfig.lifecycleEvent(__HostConfig.java:324)
>> at
>> org.apache.catalina.util.__LifecycleSupport.__fireLifecycleEvent(__LifecycleSupport.java:142)
>>
>> at
>> org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1061)
>> at
>> org.apache.catalina.core.__StandardHost.start(__StandardHost.java:840)
>> at
>> org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1053)
>> at
>> org.apache.catalina.core.__StandardEngine.start(__StandardEngine.java:463)
>> at
>> org.apache.catalina.core.__StandardService.start(__StandardService.java:525)
>> at
>> org.apache.catalina.core.__StandardServer.start(__StandardServer.java:754)
>> at org.apache.catalina.startup.__Catalina.start(Catalina.java:__595)
>> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>> at
>> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>> at
>> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>> at java.lang.reflect.Method.__invoke(Method.java:597)
>> at
>> org.apache.catalina.startup.__Bootstrap.start(Bootstrap.__java:289)
>> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>> at
>> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>> at
>> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>> at java.lang.reflect.Method.__invoke(Method.java:597)
>> at
>> org.apache.commons.daemon.__support.DaemonLoader.start(__DaemonLoader.java:219)
>> Caused by: thredds.server.wms.config.__WmsConfigException: Could
>> not find
>> wmsConfig.xml
>> at
>> thredds.server.wms.__ThreddsWmsController.init(__ThreddsWmsController.java:99)
>> at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>> at
>> sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>> at
>> sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>> at java.lang.reflect.Method.__invoke(Method.java:597)
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeCustomInitMethod(__AbstractAutowireCapableBeanFac__tory.java:1412)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeInitMethods(__AbstractAutowireCapableBeanFac__tory.java:1373)
>>
>> at
>> org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1333)
>>
>> ... 47 more
>>
>> I'm not sure why this is occurring since the
>> "/usr/local/apache-tomcat-6.0.__32/content/thredds/wmsConfig.__xml"
>> file exists.
>>
>> We increased the maximum number of open files to 4096 for the
>> tomcat user and
>> have the Java memory options set with "-Xmx16384m -Xms16384m
>> -XX:MaxPermSize=512m".
>> Also, the last software update we did on the data node was in
>> December.
>>
>> Thanks,
>>
>> Hans and Sergey
>>
>>
>> On 02/22/2012 10:00 AM, Estanislao Gonzalez wrote:
>>
>> Hi Sergei,
>>
>> I have a terrible memory so don't ask for the impossible ;-)
>>
>> I guess there are multiple things happening at the same
>> time, which is
>> pretty standard in CMIP5 context...
>>
>> You do have a problem that has nothing to do with pcmdi3
>> being overloaded,
>> which it is and causes many other problems.
>>
>> I've followed the link you sent and got a 500 error, which
>> is not near good...
>>
>> java.lang.__NoClassDefFoundError:
>> org/springframework/web/util/__UriUtils
>> org.springframework.web.util.__UrlPathHelper.__decodeRequestString(__UrlPathHelper.java:307)
>>
>> org.springframework.web.util.__UrlPathHelper.getContextPath(__UrlPathHelper.java:213)
>>
>> org.springframework.web.util.__UrlPathHelper.__getPathWithinApplication(__UrlPathHelper.java:163)
>>
>>
>> This was thrown by the ORP... (see
>> https://esgdata.gfdl.noaa.gov/__OpenidRelyingParty/home.htm
>> <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>)
>> Maybe Luca can help with this... but tell me something, have
>> you updated or
>> changed the node in any particular way?
>> There are multiple possible causes... one being you ran out
>> of memory in the
>> PermGen space, and the class could just not be loaded anymore...
>> Could you check the catalina logs? If they are huge, stop
>> the node move the
>> catalogs somewhere /tag them with a timestamp in the name)
>> and let it create
>> new ones, that will make debugging easier.
>>
>> There should be a message telling why the ORP is not working
>> anymore...
>>
>> Also could you send me this:
>> #information on tomcat, provided it's the only java server
>> in the node:
>> sudo jmap -heap $(sudo jps | grep -iv jps)
>> # current java parameters
>> grep Xmx /etc/esg.env
>>
>> This is what I have to compare:
>> $ grep Xmx /etc/esg.env
>> export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m
>> -XX:NewRatio=9"
>> $ sudo jmap -heap $(sudo jps | grep -iv jps)
>> Attaching to process ID 15222, please wait...
>> Debugger attached successfully.
>> Server compiler detected.
>> JVM version is 19.0-b09
>>
>> using thread-local object allocation.
>> Parallel GC with 8 thread(s)
>>
>> Heap Configuration:
>> MinHeapFreeRatio = 40
>> MaxHeapFreeRatio = 70
>> MaxHeapSize = 16106127360 <tel:16106127360> (15360.0MB)
>> NewSize = 1310720 (1.25MB)
>> MaxNewSize = 17592186044415 MB
>> OldSize = 5439488 (5.1875MB)
>> NewRatio = 9
>> SurvivorRatio = 8
>> PermSize = 21757952 (20.75MB)
>> MaxPermSize = 536870912 (512.0MB)
>>
>> Heap Usage:
>> PS Young Generation
>> Eden Space:
>> capacity = 757792768 (722.6875MB)
>> used = 332487616 (317.08489990234375MB)
>> free = 425305152 (405.60260009765625MB)
>> 43.875796924997836% used
>> From Space:
>> capacity = 152043520 (145.0MB)
>> used = 82383968 (78.56747436523438MB)
>> free = 69659552 (66.43252563476562MB)
>> 54.184465079471984% used
>> To Space:
>> capacity = 157483008 (150.1875MB)
>> used = 0 (0.0MB)
>> free = 157483008 (150.1875MB)
>> 0.0% used
>> PS Old Generation
>> capacity = 9663676416 (9216.0MB)
>> used = 7712867544 (7355.563682556152MB)
>> free = 1950808872 (1860.4363174438477MB)
>> 79.81297398606937% used
>> PS Perm Generation
>> capacity = 79757312 (76.0625MB)
>> used = 79734640 (76.04087829589844MB)
>> free = 22672 (0.0216217041015625MB)
>> 99.97157376617707% used
>>
>> Thanks,
>> Estani
>>
>> Am 22.02.2012 15:35, schrieb Serguei Nikonov:
>>
>> Hi Estani,
>>
>> Indeed we experienced two last days "Too many open
>> files" error in tomcat.
>> That made us to clear up catalina log file which was so
>> big that overload
>> hard drive. Thanks for your advise how prevent this
>> issue. Currently, TDS
>> is up but has a limited functionality. Very often (may
>> be most time) files
>> do not have access - TDS gives "403 - Access Denied"
>> error when trying to
>> download them. We had this issue 3 months ago. You have
>> to remember it
>> because you were the first who pointed out to this
>> problem. It's
>> interesting that at the same time files are downloadable
>> from pcmdi gateway
>> but not
>> directly from data node TDS. Bob fixed it that time on
>> pcmdi side. Now we
>> have very similar symptoms but he is sure that this is
>> because gateway is
>> too busy.
>>
>> Just for example:
>> file in this dataset
>> http://esgdata.gfdl.noaa.gov/__thredds/esgcet/1/cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.html?dataset=cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.ccb_Amon_GFDL-CM3___1pctCO2_r1i1p1_002601-003012.__nc
>> <http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.html?dataset=cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.ccb_Amon_GFDL-CM3_1pctCO2_r1i1p1_002601-003012.nc>
>>
>> was accessible when I try before writing this email and
>> now is not.
>>
>> At the same time it's downloadable from pcmdi gateway
>> all time.
>>
>> regards,
>> Sergey
>>
>>
>> On 02/22/2012 05:41 AM, Estanislao Gonzalez wrote:
>>
>> Hi Sergei,
>>
>> Your node is down because of too many files:
>> http://esgdata.gfdl.noaa.gov/__esgf-node-manager/
>> <http://esgdata.gfdl.noaa.gov/esgf-node-manager/>
>>
>> java.io.FileNotFoundException:
>> /usr/local/apache-tomcat-6.0.__32/webapps/esgf-node-manager/__index.html
>> (Too
>> many
>> open files)
>> java.io.FileInputStream.open(__Native Method)
>> java.io.FileInputStream.<init>__(FileInputStream.java:106)
>> org.apache.naming.resources.__FileDirContext$FileResource.__streamContent(FileDirContext.__java:927)
>>
>>
>> org.apache.catalina.servlets.__DefaultServlet.copy(__DefaultServlet.java:1832)
>> org.apache.catalina.servlets.__DefaultServlet.serveResource(__DefaultServlet.java:919)
>>
>> org.apache.catalina.servlets.__DefaultServlet.doGet(__DefaultServlet.java:398)
>> javax.servlet.http.__HttpServlet.service(__HttpServlet.java:617)
>> javax.servlet.http.__HttpServlet.service(__HttpServlet.java:717)
>>
>> The catalina logs should also have messages
>> regarding this.
>> This is how you can prevent it from happening again:
>> http://esgf.org/wiki/ESGFNode/__FAQ#Tomcat_is_complaining___about_too_many_open_files
>> <http://esgf.org/wiki/ESGFNode/FAQ#Tomcat_is_complaining_about_too_many_open_files>
>>
>> Now, you'll have to restart the node.
>>
>> Thanks,
>> Estani
>>
>>
>>
>>
>>
>>
>>
>
More information about the GO-ESSP-TECH
mailing list