Investigating high CPU utilization of JVM processes – Finding the Culprit.
Recently, in one of our client sites, we have observed a crunch in CPU utilization. At some random intervals, CPU went up to 100% and kept utilizing the same until restarting the application. This application is a standalone java application and involves with heavy database operations. It was critical to find the issue and fixing it soon, as this CPU utilization affected other applications on the same server. So I’m going to explain the methodology that we used to find and fix this issue.
Sun Solaris Server – 12 cores
- Finding the process
We have used “top” command to get the most CPU utilizing applications.
Since the server has 12 cores, it showed total CPU utilization by all cores.
But once disabled the Irix mode (pressing ‘I’) we got average CPU per process. Then obtain the PID of the most CPU utilizing the process.
Irix/Solaris_Mode_toggle When operating in 'Solaris mode' ('I' toggled Off), a task's CPU usage will be divided by the total number of CPUs. After issuing this command, you'll be informed of the new state of this toggle.
- Finding the thread
Once the PID of the process is obtained, used ‘H’ key to list the threads of that process. Then obtained the NID or ‘Soft process Id’ from the first column.
- Get the thread dump
Used the following command to obtain a thread dump of the JVM process identified in step 1.
jstack -l PID > jstack.txt
- Isolate the stack trace
Inside the thread dump file, it had the stack trace of each thread. But most important thing is identifying the correct stack trace. For each stack trace, there was a nid field in HEX format which is equivalent to the thread id obtained in step 2. Once the thread id is converted to HEX, we have isolated the stack trace.
- Find the culprit code block
From the stack trace, we have identified the code block which was executed with 100% CPU. In our case, it was a while loop which entered into an infinite cycle when certain condition met.
- Fix it!
So we have fixed that, tested and deployed to production. Now the JVM process is running smoothly even CPU doesn’t know that 🙂
- Latest Posts