By Ram Lakshmanan.
Figure 1: System usage chart
Have you ever encountered a circumstance where your application’s CPU maxes out and never goes down even if traffic volume goes down? Did you have to recycle to JVM to remedy the problem? Even if you recycle the JVM, does your CPU start to spike up after some time?
This type of problem surfaces because of one of the following reasons:
- Repeated Full GC
- Non-terminating Loops
- Non-synchronized access to java.util.HashMap
Let’s see how to diagnose these scenarios and address them.
Scenario 1: Repeated Full GC
Full GC is an important phase of the Garbage Collection process. During this phase, the entire JVM is frozen, and every single object in the memory is evaluated for garbage collection. Naturally, it turns out to be a CPU-intensive operation. If an application happens to have a memory leak, “Full GC” will start to run repeatedly, without reclaiming any memory. When “Full GC” runs repeatedly, the CPU will start to spike up and never come down.
Figure 2: The HP Jmeter tool showing repeated runs of the Full GC process
Tactical Solution
To resolve the problem completely, the memory leak in the application has to be fixed. Resolving memory leaks might take some time. (Of course, it goes without saying, you can engage experts like me to resolve it quickly.) Until then, the following tactical solution can be implemented to keep the application functioning in production. You need to instrument a script that would monitor a garbage collection log file of the application every two minutes. If the script notices more than three “Full GC” runs in a 10-minute window, that particular JVM should be decommissioned from taking production traffic. The JVM should be recycled after capturing the thread dump and heap dump. After recycling, the JVM should be placed back into production to take active traffic.
Strategic Solution
Using the Heap Dump/Thread Dump, the root cause of the problem should be identified and fixed.
Scenario 2: Non-terminating Loops
Sometimes, due to a bug in your code or in the 3rd party library that you use, loop constructs (while, for, do..while) may run forever. Consider the next scenario:
while (myCondition) { statement-1; statement-1; statement-n; }
Due to a certain data condition or a bug in the code, “myCondition” may never get satisfied. In such a scenario, the thread would be spinning infinitely in the while loop. This would cause the CPU to spike up. Unless the JVM is restarted, the CPU maxing out wouldn’t stop at all.
Solution
When you observe the CPU maxing out and utilization not coming down, you should take two thread dumps in a period of 10 seconds between each thread dump—right when the problem is happening. Every thread in a “runnable” state in the first taken thread dump should be noted down. Same threads’ state in the second thread dump should be compared. If in the second thread dump those threads remain also the “runnable” state within the same method, it would indicate in which part of the code thread(s) are looping infinitely. Once you know which part of the code is looping infinitely, it should be trivial to address the problem.
Example
Listing 1 is the excerpt from the thread dump in which ‘Thread-0’ is looping infinitely in the sample application:
&&&&"Thread-0&&&&" prio=6 tid=0x000000000b583000 nid=0x10adcrunnable [0x000000000cb6f000] java.lang.Thread.State: RUNNABLE at com.tier1app.NonTerminatingLooper$LoopForeverThread.loopForever (NonTerminatingLooper.java:32) at com.tier1app.NonTerminatingLooper$LoopForeverThread.method2 (NonTerminatingLooper.java:27) at com.tier1app.NonTerminatingLooper$LoopForeverThread.method1 (NonTerminatingLooper.java:22) at com.tier1app.NonTerminatingLooper$LoopForeverThread.run (NonTerminatingLooper.java:16)
Listing 1: Stack Trace of “Thread-0” in first thread dump
&&&&"Thread-0&&&&" prio=6 tid=0x000000000b583000 nid=0x10adcrunnable [0x000000000cb6f000] java.lang.Thread.State: RUNNABLE at com.tier1app.NonTerminatingLooper$LoopForeverThread.loopForever (NonTerminatingLooper.java:32) at com.tier1app.NonTerminatingLooper$LoopForeverThread.method2 (NonTerminatingLooper.java:27) at com.tier1app.NonTerminatingLooper$LoopForeverThread.method1 (NonTerminatingLooper.java:22) at com.tier1app.NonTerminatingLooper$LoopForeverThread.run (NonTerminatingLooper.java:16)
Listing 2: Stack Trace of “Thread-0” in second thread dump that was taken 10 seconds after the first thread dump
From the stack trace, you could infer that “Thread-0” is spinning infinitely in the loopForever() method. Listing 2 is the source code of the loopForever() method. Here, you can notice the non-terminating condition; in other words, “while (true)”.
public void loopForever() { while (true) { new String(&&&&"Loop forever&&&&"); } }
Scenario 3: Non-synchronized Access of java.util.HashMap
When multiple threads try to access HashMap’s get() and put() APIs concurrently, it would cause threads go into infinite looping. This problem doesn’t always happen, but it does happen on occasion. Here is a detailed blog that describes this problem.
Solution
When you observe the CPU maxing out and utilization not coming down, you should take a thread dump—right when the problem is happening. You need to see which are threads that are in a “runnable” state. If that thread happens to be working on HashMap’s get() or put() API, it’s indicative that the HashMap is causing the CPU spike. Now, you can replace that HashMap with a ConcurrentHashMap.
Example
Following is an excerpt from the thread dump that indicates the infinite looping that is happening in HashMap:
&&&"Thread-0&&&" prio=6 tid=0x000000000b583000 nid=0x10adcrunnable [0x000000000cb6f000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(HashMap.java:374) at com.tier1app.HashMapLooper$AddForeverThread.AddForever (NonTerminatingLooper.java:32) at com.tier1app.HashMapLooper$AddForeverThread.method2 (NonTerminatingLooper.java:27) at com.tier1app.HashMapLooper$AddForeverThread.method1 (NonTerminatingLooper.java:22) at com.tier1app.NonTerminatingLooper$LoopForeverThread.run (NonTerminatingLooper.java:16)
About the Author
Every single day, millions and millions of people in North America—bank, travel, and commerce—use the applications that Ram Lakshmanan has architected. Ram is an acclaimed speaker in major conferences on scalability, availability, and performance topics. Recently, he has founded a startup, which specializes in troubleshooting performance problems.
# # #
*** This article was contributed. © All rights reserved, Developer.com. ***