Infrastructure at your Service

Pascal Brand

WebLogic Server process takes 100% CPU

During some monitoring work on a WebLogic 12.1.3 environment, I noticed that top shows the process 12013 using 800% CPU (We have a 12 CPU machine) and had to find out what’s happening !!

I used top to check the process ID and then the threads taking the CPU

ps -ef |grep 12013

weblogic 12013 11836 23 Apr03 ?        03:38:31 /app/weblogic/Java/jdk/bin/java -server -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dweblogic.Name=msD2-02 -Djava.security.policy=/app/weblogic/Middleware/wlserver/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/weblogic/domains/myDomain/servers/msD2-02/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dcom.sun.xml.ws.api.streaming.XMLStreamReaderFactory.woodstox=true -Dcom.sun.xml.ws.api.streaming.XMLStreamWriterFactory.woodstox=true -Djava.io.tmpdir=/app/weblogic/tmp/myDomain/msD2-02 -Ddomain.home=/app/weblogic/domains/myDomain -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.protocolVersion=TLS1 -Dweblogic.security.disableNullCipher=true -Djava.security.egd=file:///dev/./urandom -Dweblogic.security.allowCryptoJDefaultJCEVerification=true -Dweblogic.nodemanager.ServiceEnabled=true -Djava.endorsed.dirs=/app/weblogic/Java/jdk1.8.0_45/jre/lib/endorsed:/app/weblogic/Middleware/wlserver/../oracle_common/modules/endorsed -da -Dwls.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.management.server=https://10.10.1.1:7001 -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true weblogic.Server
top -H -b -p 12013

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12624 weblogic  20   0 10.5g 3.6g  30m S 83.1 11.5  18053:38 java
 2540 weblogic  20   0 10.5g 3.6g  30m S 79.4 11.5   3:11.62 java
 2591 weblogic  20   0 10.5g 3.6g  30m S 79.4 11.5   3:08.02 java
12436 weblogic  20   0 10.5g 3.6g  30m S  9.2 11.5   3:14.59 java

Now that I have the Unix threads PID, I will get the thread dump of the WebLogic Server process and try to figure out the culprits. To generate the thread dump I used the jstack java tool.

jstack -l 12013 >  $HOME/jstack_100_CPU.txt

The Thread dump provides the threads with the ID in Hexadecimal, thus those provided from the top needs to be converted:

[myDomain]$ printf '%x\n' 12624
3150
[myDomain]$  printf '%x\n' 2540
9ec
[myDomain]$  printf '%x\n' 2591
a1f
[myDomain]$
"LDAPConnThread-1172 ldaps://ldapserver.dbi-lab.com:3636" #167119 daemon prio=5 os_prio=0 tid=0x00007f5444087800 nid=0xa1f runnable [0x00007f5409e88000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked  (a sun.nio.ch.Util$2)
        - locked  (a java.util.Collections$UnmodifiableSet)
        - locked  (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729)
        at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462)
        at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424)
        at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked  (a java.io.BufferedInputStream)
        at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source)
        at netscape.ldap.LDAPConnThread.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)

 
"[STUCK] ExecuteThread: '302' for queue: 'weblogic.kernel.Default (self-tuning)'" #326 daemon prio=1 os_prio=0 tid=0x00007f5560376000 nid=0x3150 runnable [0x00007f543636e000]
   java.lang.Thread.State: RUNNABLE
        at java.io.InputStream.skip(InputStream.java:224)
        at weblogic.utils.http.HttpChunkInputStream.skip(HttpChunkInputStream.java:287)
        at weblogic.utils.http.HttpChunkInputStream.skipAllChunk(HttpChunkInputStream.java:497)
        at weblogic.servlet.internal.ServletInputStreamImpl.ensureChunkedConsumed(ServletInputStreamImpl.java:51)
        at weblogic.servlet.internal.ServletRequestImpl.skipUnreadBody(ServletRequestImpl.java:217)
        at weblogic.servlet.internal.ServletRequestImpl.reset(ServletRequestImpl.java:169)
        at weblogic.servlet.internal.HttpConnectionHandler.prepareRequestForReuse(HttpConnectionHandler.java:258)
        at weblogic.servlet.internal.HttpConnectionHandler.requeue(HttpConnectionHandler.java:665)
        at weblogic.servlet.internal.VirtualConnection.requeue(VirtualConnection.java:332)
        at weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1657)
        at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1582)
        at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:255)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:311)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:263)

"LDAPConnThread-1171 ldaps://ldapserver.dbi-lab.com:3636"" #167118 daemon prio=5 os_prio=0 tid=0x00007f5544048000 nid=0x9ec runnable [0x00007f540a28c000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
        at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
        at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:193)
        - locked  (a java.lang.Object)
        at java.nio.channels.spi.AbstractSelector$1.interrupt(AbstractSelector.java:213)
        at java.nio.channels.spi.AbstractSelector.begin(AbstractSelector.java:219)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:78)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked  (a sun.nio.ch.Util$2)
        - locked  (a java.util.Collections$UnmodifiableSet)
        - locked  (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746)
        at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729)
        at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462)
        at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424)
        at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
        - locked  (a java.io.BufferedInputStream)
        at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source)
        at netscape.ldap.LDAPConnThread.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)

This looks to be an issue with the LDAP server or the network.

The WebLogic java threads loops on the LDAP connection to get data. This loop should fail in a timeout depending on the Authentication provider LDAP connection timeout settings. But it looks there is an issue with the JDK that re-initiates the connection in the sun.nio.ch.EPollArrayWrapper class.

Once the LDAP connection was back the WebLogic Domain could be restarted properly and the service was back.

A possible workaround could be to not use the weblogic.socket.NIOSocketMuxer but the weblogic.socket.PosixSocketMuxer. This workaround is described in Oracle Support note:2128032.1
Or a better solution is to upgrade to the latest WebLogic Software version and apply the latest PSU where this issue is fixed.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Pascal Brand
Pascal Brand

Senior Consultant & Middleware Technology Leader