[Go-essp-tech] noaa node is not working...

Cinquini, Luca (3880) Luca.Cinquini at jpl.nasa.gov
Wed Feb 29 11:26:11 MST 2012


Hi Serguei,
        I just sent an email to the esgf-devel list asking for a volunteer to install the datanode-only setup. I have re-installed it today and it works fine for me. I'll be happy to work with you on the GFDL installation of a data node, but before we start - can you remind me again what are the problems with your current datanode ? Is it an authorization problem ? Or is it something different ?
thanks, Luca

On Feb 29, 2012, at 11:22 AM, Serguei Nikonov wrote:

> Hi Luca,
>
> is any good news about data node release? We are looking forward it cause GFDL
> data node practically nonfunctional having rate of successful requests ~1% (I
> wrote it recently on go-essp-tech list). Keeping in mind that right now is very
> hot time for users - all wants to get data before Hawaii meeting we are very
> anxious about this problem.
>
> Thanks,
> Sergey
>
>
> On 02/23/2012 05:13 PM, Luca Cinquini wrote:
>> Hi Serguei,
>> sorry I haven't replied to this so far... I think we should wait to tackle this
>> till next week, when you can install the new release of the data node. At that
>> point, we'll know for sure what software you are running, and there should be
>> enough debug statements in the logs to figure out what's wrong.
>> So please be patient, and bug as again by mid week if you haven't heard from us.
>> thanks, Luca
>>
>> On Thu, Feb 23, 2012 at 7:45 AM, Serguei Nikonov <serguei.nikonov at noaa.gov
>> <mailto:serguei.nikonov at noaa.gov>> wrote:
>>
>>    Hi Estani,
>>
>>    we increased memory allocation 2 months ago. Unfortunately the main issue we
>>    had, 403 error, is still here.
>>
>>    Sergey
>>
>>
>>    On 02/23/2012 02:46 AM, Estanislao Gonzalez wrote:
>>
>>        Hi Hans,
>>
>>        I was about to suggest using /usr/local/java :-)
>>
>>        Don't worry about the wms config error... I have that too... As we don't
>>        use it,
>>        there's no real harm, though indeed it would great to set it up properly.
>>
>>          From your mail I'd think you are ready to go. I haven't completely
>>        understand
>>        if you have increased the memory allocation or was already at those
>>        values you
>>        show.
>>        If the former s true, then the memory and open file increase should
>>        solve all
>>        the problems you were having.
>>
>>        Cheers,
>>        Estani
>>
>>        Am 23.02.2012 03:31, schrieb Hans Vahlenkamp:
>>
>>            Some more information... I found why running jmap was failing. We have
>>            another java version installed; the one provided by Red Hat with
>>            RHEL 5 which
>>            perhaps should be removed. Using jmap from the Java version provided
>>            with
>>            the data node software works.
>>
>>            [root at data2 ~]# sudo /usr/local/java/bin/jmap -heap $(sudo jps |
>>            grep -iv jps)
>>            Attaching to process ID 28509, please wait...
>>            Debugger attached successfully.
>>            Server compiler detected.
>>            JVM version is 19.0-b09
>>
>>            using thread-local object allocation.
>>            Parallel GC with 10 thread(s)
>>
>>            Heap Configuration:
>>            MinHeapFreeRatio = 40
>>            MaxHeapFreeRatio = 70
>>            MaxHeapSize = 17179869184 <tel:17179869184> (16384.0MB)
>>            NewSize = 1310720 (1.25MB)
>>            MaxNewSize = 17592186044415 MB
>>            OldSize = 5439488 (5.1875MB)
>>            NewRatio = 2
>>            SurvivorRatio = 8
>>            PermSize = 21757952 (20.75MB)
>>            MaxPermSize = 536870912 (512.0MB)
>>
>>            Heap Usage:
>>            PS Young Generation
>>            Eden Space:
>>            capacity = 4295032832 (4096.0625MB)
>>            used = 3749511000 (3575.812339782715MB)
>>            free = 545521832 (520.2501602172852MB)
>>            87.29877387815041% used
>>             From Space:
>>            capacity = 715784192 (682.625MB)
>>            used = 715764432 (682.6061553955078MB)
>>            free = 19760 (0.0188446044921875MB)
>>            99.99723939139466% used
>>            To Space:
>>            capacity = 715784192 (682.625MB)
>>            used = 0 (0.0MB)
>>            free = 715784192 (682.625MB)
>>            0.0% used
>>            PS Old Generation
>>            capacity = 11453267968 (10922.6875MB)
>>            used = 4231302192 (4035.284225463867MB)
>>            free = 7221965776 (6887.403274536133MB)
>>            36.94406001694974% used
>>            PS Perm Generation
>>            capacity = 85262336 (81.3125MB)
>>            used = 85221144 (81.2732162475586MB)
>>            free = 41192 (0.03928375244140625MB)
>>            99.95168792935722% used
>>
>>            However, we are still getting frequent "HTTP Status 403 - Access
>>            Denied."
>>            failures when trying to download files directly from our local TDS.
>>
>>            Hans
>>
>>
>>            On 02/22/2012 08:21 PM, Hans Vahlenkamp wrote:
>>
>>                Hello Estani,
>>
>>                After restarting our data node, the ORP address
>>                "https://esgdata.gfdl.noaa.__gov/OpenidRelyingParty/home.__htm
>>                <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>"
>>                is functioning again. Trying to see the memory map of the Java
>>                process is
>>                currently failing:
>>
>>                [root at data2 bin]# sudo jmap -heap $(sudo jps | grep -iv jps)
>>                Attaching to process ID 28509, please wait...
>>                Exception in thread "main"
>>                java.lang.reflect.__InvocationTargetException
>>                at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>>                at
>>                sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>>                at
>>                sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>>                at java.lang.reflect.Method.__invoke(Method.java:597)
>>                at sun.tools.jmap.JMap.runTool(__JMap.java:179)
>>                at sun.tools.jmap.JMap.main(JMap.__java:110)
>>                Caused by: sun.jvm.hotspot.runtime.__VMVersionMismatchException:
>>                Supported
>>                versions are 19.1-b02. Target VM is 19.0-b09
>>                at sun.jvm.hotspot.runtime.VM.__checkVMVersion(VM.java:224)
>>                at sun.jvm.hotspot.runtime.VM.<__init>(VM.java:287)
>>                at sun.jvm.hotspot.runtime.VM.__initialize(VM.java:357)
>>                at
>>                sun.jvm.hotspot.bugspot.__BugSpotAgent.setupVM(__BugSpotAgent.java:594)
>>                at
>>                sun.jvm.hotspot.bugspot.__BugSpotAgent.go(BugSpotAgent.__java:494)
>>                at
>>                sun.jvm.hotspot.bugspot.__BugSpotAgent.attach(__BugSpotAgent.java:332)
>>                at sun.jvm.hotspot.tools.Tool.__start(Tool.java:163)
>>                at sun.jvm.hotspot.tools.__HeapSummary.main(HeapSummary.__java:39)
>>                ... 6 more
>>
>>                although I recall that it worked previously.
>>
>>                I'm not sure if this is a related problem, but after a restart
>>                we noticed in
>>                the "catalina.err" file these entries:
>>
>>                SEVERE: StandardWrapper.Throwable
>>                org.springframework.beans.__factory.BeanCreationException: Error
>>                creating bean
>>                with name 'wmsController' defined in ServletContext resource
>>                [/WEB-INF/wms-servlet.xml]: Invocation of init method failed; nested
>>                exception is thredds.server.wms.config.__WmsConfigException:
>>                Could not find
>>                wmsConfig.xml
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1336)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.doCreateBean(__AbstractAutowireCapableBeanFac__tory.java:471)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory$1.run(__AbstractAutowireCapableBeanFac__tory.java:409)
>>
>>                at java.security.__AccessController.doPrivileged(__Native Method)
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.createBean(__AbstractAutowireCapableBeanFac__tory.java:380)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractBeanFactory$1.__getObject(AbstractBeanFactory.__java:264)
>>
>>                at
>>                org.springframework.beans.__factory.support.__DefaultSingletonBeanRegistry.__getSingleton(__DefaultSingletonBeanRegistry.__java:220)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractBeanFactory.doGetBean(__AbstractBeanFactory.java:261)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:185)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractBeanFactory.getBean(__AbstractBeanFactory.java:164)
>>
>>                at
>>                org.springframework.beans.__factory.support.__DefaultListableBeanFactory.__preInstantiateSingletons(__DefaultListableBeanFactory.__java:429)
>>
>>                at
>>                org.springframework.context.__support.__AbstractApplicationContext.__finishBeanFactoryInitializatio__n(AbstractApplicationContext.__java:729)
>>
>>                at
>>                org.springframework.context.__support.__AbstractApplicationContext.__refresh(__AbstractApplicationContext.__java:381)
>>
>>                at
>>                org.springframework.web.__servlet.FrameworkServlet.__createWebApplicationContext(__FrameworkServlet.java:402)
>>
>>                at
>>                org.springframework.web.__servlet.FrameworkServlet.__initWebApplicationContext(__FrameworkServlet.java:316)
>>
>>                at
>>                org.springframework.web.__servlet.FrameworkServlet.__initServletBean(__FrameworkServlet.java:282)
>>
>>                at
>>                org.springframework.web.__servlet.HttpServletBean.init(__HttpServletBean.java:126)
>>                at javax.servlet.GenericServlet.__init(GenericServlet.java:212)
>>                at
>>                org.apache.catalina.core.__StandardWrapper.loadServlet(__StandardWrapper.java:1173)
>>                at
>>                org.apache.catalina.core.__StandardWrapper.load(__StandardWrapper.java:993)
>>                at
>>                org.apache.catalina.core.__StandardContext.loadOnStartup(__StandardContext.java:4420)
>>
>>                at
>>                org.apache.catalina.core.__StandardContext.start(__StandardContext.java:4733)
>>                at
>>                org.apache.catalina.core.__ContainerBase.__addChildInternal(__ContainerBase.java:799)
>>                at
>>                org.apache.catalina.core.__ContainerBase.addChild(__ContainerBase.java:779)
>>                at
>>                org.apache.catalina.core.__StandardHost.addChild(__StandardHost.java:601)
>>                at
>>                org.apache.catalina.startup.__HostConfig.deployDescriptor(__HostConfig.java:675)
>>                at
>>                org.apache.catalina.startup.__HostConfig.deployDescriptors(__HostConfig.java:601)
>>                at
>>                org.apache.catalina.startup.__HostConfig.deployApps(__HostConfig.java:502)
>>                at
>>                org.apache.catalina.startup.__HostConfig.start(HostConfig.__java:1315)
>>                at
>>                org.apache.catalina.startup.__HostConfig.lifecycleEvent(__HostConfig.java:324)
>>                at
>>                org.apache.catalina.util.__LifecycleSupport.__fireLifecycleEvent(__LifecycleSupport.java:142)
>>
>>                at
>>                org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1061)
>>                at
>>                org.apache.catalina.core.__StandardHost.start(__StandardHost.java:840)
>>                at
>>                org.apache.catalina.core.__ContainerBase.start(__ContainerBase.java:1053)
>>                at
>>                org.apache.catalina.core.__StandardEngine.start(__StandardEngine.java:463)
>>                at
>>                org.apache.catalina.core.__StandardService.start(__StandardService.java:525)
>>                at
>>                org.apache.catalina.core.__StandardServer.start(__StandardServer.java:754)
>>                at org.apache.catalina.startup.__Catalina.start(Catalina.java:__595)
>>                at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>>                at
>>                sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>>                at
>>                sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>>                at java.lang.reflect.Method.__invoke(Method.java:597)
>>                at
>>                org.apache.catalina.startup.__Bootstrap.start(Bootstrap.__java:289)
>>                at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>>                at
>>                sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>>                at
>>                sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>>                at java.lang.reflect.Method.__invoke(Method.java:597)
>>                at
>>                org.apache.commons.daemon.__support.DaemonLoader.start(__DaemonLoader.java:219)
>>                Caused by: thredds.server.wms.config.__WmsConfigException: Could
>>                not find
>>                wmsConfig.xml
>>                at
>>                thredds.server.wms.__ThreddsWmsController.init(__ThreddsWmsController.java:99)
>>                at sun.reflect.__NativeMethodAccessorImpl.__invoke0(Native Method)
>>                at
>>                sun.reflect.__NativeMethodAccessorImpl.__invoke(__NativeMethodAccessorImpl.java:__39)
>>                at
>>                sun.reflect.__DelegatingMethodAccessorImpl.__invoke(__DelegatingMethodAccessorImpl.__java:25)
>>
>>                at java.lang.reflect.Method.__invoke(Method.java:597)
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeCustomInitMethod(__AbstractAutowireCapableBeanFac__tory.java:1412)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.invokeInitMethods(__AbstractAutowireCapableBeanFac__tory.java:1373)
>>
>>                at
>>                org.springframework.beans.__factory.support.__AbstractAutowireCapableBeanFac__tory.initializeBean(__AbstractAutowireCapableBeanFac__tory.java:1333)
>>
>>                ... 47 more
>>
>>                I'm not sure why this is occurring since the
>>                "/usr/local/apache-tomcat-6.0.__32/content/thredds/wmsConfig.__xml"
>>                file exists.
>>
>>                We increased the maximum number of open files to 4096 for the
>>                tomcat user and
>>                have the Java memory options set with "-Xmx16384m -Xms16384m
>>                -XX:MaxPermSize=512m".
>>                Also, the last software update we did on the data node was in
>>                December.
>>
>>                Thanks,
>>
>>                Hans and Sergey
>>
>>
>>                On 02/22/2012 10:00 AM, Estanislao Gonzalez wrote:
>>
>>                    Hi Sergei,
>>
>>                    I have a terrible memory so don't ask for the impossible ;-)
>>
>>                    I guess there are multiple things happening at the same
>>                    time, which is
>>                    pretty standard in CMIP5 context...
>>
>>                    You do have a problem that has nothing to do with pcmdi3
>>                    being overloaded,
>>                    which it is and causes many other problems.
>>
>>                    I've followed the link you sent and got a 500 error, which
>>                    is not near good...
>>
>>                    java.lang.__NoClassDefFoundError:
>>                    org/springframework/web/util/__UriUtils
>>                    org.springframework.web.util.__UrlPathHelper.__decodeRequestString(__UrlPathHelper.java:307)
>>
>>                    org.springframework.web.util.__UrlPathHelper.getContextPath(__UrlPathHelper.java:213)
>>
>>                    org.springframework.web.util.__UrlPathHelper.__getPathWithinApplication(__UrlPathHelper.java:163)
>>
>>
>>                    This was thrown by the ORP... (see
>>                    https://esgdata.gfdl.noaa.gov/__OpenidRelyingParty/home.htm
>>                    <https://esgdata.gfdl.noaa.gov/OpenidRelyingParty/home.htm>)
>>                    Maybe Luca can help with this... but tell me something, have
>>                    you updated or
>>                    changed the node in any particular way?
>>                    There are multiple possible causes... one being you ran out
>>                    of memory in the
>>                    PermGen space, and the class could just not be loaded anymore...
>>                    Could you check the catalina logs? If they are huge, stop
>>                    the node move the
>>                    catalogs somewhere /tag them with a timestamp in the name)
>>                    and let it create
>>                    new ones, that will make debugging easier.
>>
>>                    There should be a message telling why the ORP is not working
>>                    anymore...
>>
>>                    Also could you send me this:
>>                    #information on tomcat, provided it's the only java server
>>                    in the node:
>>                    sudo jmap -heap $(sudo jps | grep -iv jps)
>>                    # current java parameters
>>                    grep Xmx /etc/esg.env
>>
>>                    This is what I have to compare:
>>                    $ grep Xmx /etc/esg.env
>>                    export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m
>>                    -XX:NewRatio=9"
>>                    $ sudo jmap -heap $(sudo jps | grep -iv jps)
>>                    Attaching to process ID 15222, please wait...
>>                    Debugger attached successfully.
>>                    Server compiler detected.
>>                    JVM version is 19.0-b09
>>
>>                    using thread-local object allocation.
>>                    Parallel GC with 8 thread(s)
>>
>>                    Heap Configuration:
>>                    MinHeapFreeRatio = 40
>>                    MaxHeapFreeRatio = 70
>>                    MaxHeapSize = 16106127360 <tel:16106127360> (15360.0MB)
>>                    NewSize = 1310720 (1.25MB)
>>                    MaxNewSize = 17592186044415 MB
>>                    OldSize = 5439488 (5.1875MB)
>>                    NewRatio = 9
>>                    SurvivorRatio = 8
>>                    PermSize = 21757952 (20.75MB)
>>                    MaxPermSize = 536870912 (512.0MB)
>>
>>                    Heap Usage:
>>                    PS Young Generation
>>                    Eden Space:
>>                    capacity = 757792768 (722.6875MB)
>>                    used = 332487616 (317.08489990234375MB)
>>                    free = 425305152 (405.60260009765625MB)
>>                    43.875796924997836% used
>>                     From Space:
>>                    capacity = 152043520 (145.0MB)
>>                    used = 82383968 (78.56747436523438MB)
>>                    free = 69659552 (66.43252563476562MB)
>>                    54.184465079471984% used
>>                    To Space:
>>                    capacity = 157483008 (150.1875MB)
>>                    used = 0 (0.0MB)
>>                    free = 157483008 (150.1875MB)
>>                    0.0% used
>>                    PS Old Generation
>>                    capacity = 9663676416 (9216.0MB)
>>                    used = 7712867544 (7355.563682556152MB)
>>                    free = 1950808872 (1860.4363174438477MB)
>>                    79.81297398606937% used
>>                    PS Perm Generation
>>                    capacity = 79757312 (76.0625MB)
>>                    used = 79734640 (76.04087829589844MB)
>>                    free = 22672 (0.0216217041015625MB)
>>                    99.97157376617707% used
>>
>>                    Thanks,
>>                    Estani
>>
>>                    Am 22.02.2012 15:35, schrieb Serguei Nikonov:
>>
>>                        Hi Estani,
>>
>>                        Indeed we experienced two last days "Too many open
>>                        files" error in tomcat.
>>                        That made us to clear up catalina log file which was so
>>                        big that overload
>>                        hard drive. Thanks for your advise how prevent this
>>                        issue. Currently, TDS
>>                        is up but has a limited functionality. Very often (may
>>                        be most time) files
>>                        do not have access - TDS gives "403 - Access Denied"
>>                        error when trying to
>>                        download them. We had this issue 3 months ago. You have
>>                        to remember it
>>                        because you were the first who pointed out to this
>>                        problem. It's
>>                        interesting that at the same time files are downloadable
>>                        from pcmdi gateway
>>                        but not
>>                        directly from data node TDS. Bob fixed it that time on
>>                        pcmdi side. Now we
>>                        have very similar symptoms but he is sure that this is
>>                        because gateway is
>>                        too busy.
>>
>>                        Just for example:
>>                        file in this dataset
>>                        http://esgdata.gfdl.noaa.gov/__thredds/esgcet/1/cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.html?dataset=cmip5.__output1.NOAA-GFDL.GFDL-CM3.__1pctCO2.mon.atmos.Amon.r1i1p1.__v20110601.ccb_Amon_GFDL-CM3___1pctCO2_r1i1p1_002601-003012.__nc
>>                        <http://esgdata.gfdl.noaa.gov/thredds/esgcet/1/cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.html?dataset=cmip5.output1.NOAA-GFDL.GFDL-CM3.1pctCO2.mon.atmos.Amon.r1i1p1.v20110601.ccb_Amon_GFDL-CM3_1pctCO2_r1i1p1_002601-003012.nc>
>>
>>                        was accessible when I try before writing this email and
>>                        now is not.
>>
>>                        At the same time it's downloadable from pcmdi gateway
>>                        all time.
>>
>>                        regards,
>>                        Sergey
>>
>>
>>                        On 02/22/2012 05:41 AM, Estanislao Gonzalez wrote:
>>
>>                            Hi Sergei,
>>
>>                            Your node is down because of too many files:
>>                            http://esgdata.gfdl.noaa.gov/__esgf-node-manager/
>>                            <http://esgdata.gfdl.noaa.gov/esgf-node-manager/>
>>
>>                            java.io.FileNotFoundException:
>>                            /usr/local/apache-tomcat-6.0.__32/webapps/esgf-node-manager/__index.html
>>                            (Too
>>                            many
>>                            open files)
>>                            java.io.FileInputStream.open(__Native Method)
>>                            java.io.FileInputStream.<init>__(FileInputStream.java:106)
>>                            org.apache.naming.resources.__FileDirContext$FileResource.__streamContent(FileDirContext.__java:927)
>>
>>
>>                            org.apache.catalina.servlets.__DefaultServlet.copy(__DefaultServlet.java:1832)
>>                            org.apache.catalina.servlets.__DefaultServlet.serveResource(__DefaultServlet.java:919)
>>
>>                            org.apache.catalina.servlets.__DefaultServlet.doGet(__DefaultServlet.java:398)
>>                            javax.servlet.http.__HttpServlet.service(__HttpServlet.java:617)
>>                            javax.servlet.http.__HttpServlet.service(__HttpServlet.java:717)
>>
>>                            The catalina logs should also have messages
>>                            regarding this.
>>                            This is how you can prevent it from happening again:
>>                            http://esgf.org/wiki/ESGFNode/__FAQ#Tomcat_is_complaining___about_too_many_open_files
>>                            <http://esgf.org/wiki/ESGFNode/FAQ#Tomcat_is_complaining_about_too_many_open_files>
>>
>>                            Now, you'll have to restart the node.
>>
>>                            Thanks,
>>                            Estani
>>
>>
>>
>>
>>
>>
>>
>



More information about the GO-ESSP-TECH mailing list