Question: How do I troubleshoot high CPU Load? (Resource Issues)

Question

Today’s question comes from Facebook!

“Hi folks, how would I go about troubleshooting this high CPU load? Thanks!”

-Awesome facebook user! 😉

Depending, this is normal. Why? Because that’s what PHP does. It will take up to 100% CPU usage because it’s a process that isn’t limited to how much of the system’s resources it can use.

Understanding Linux Load Averages

So let’s look at “load average” Your load average is what’s going to tell you your average CPU Load over time, which is what you should be concerned about. You can read more on this awesome article

Brendan Gregg’s Blog – Linux Load Averages: Solving the Mystery

Some interpretations:

If the averages are 0.0, then your system is idle.

If the 1-minute average is higher than the 5 or 15-minute averages, then the load is -increasing.

If the 1-minute average is lower than the 5 or 15-minute averages, then the load is decreasing.

If they are higher than your CPU count, then you might have a performance problem (it depends).

Gregg’s Blog – Linux Load Averages: Solving the Mystery

So now that you know what’s going on with the resources. What’s actually causing the load spike?

Investigating Server and Website Resource Issues

You now have two options. Upgrade your server resources, or review your website’s configuration and resource consumption.

Not in that particular order, always review everything and then come up with a game plan to make changes. You might see in the second half that a single website plugin is causing a majority of the load. Which you can then make a decision to remove the plugin, deploy the site on its own instance or upgrade the server.

Server Resources

As you can see from the htop command in the screenshot, this instance has 1 CPU Core and 2G of memory.

Personally, 1 CPU Core and 2G of memory is not the bare minimum I would consider. Even for a single site, but I don’t fault this person. This isn’t common knowledge, and again it’s my personal preference.

However…. it’s possible. You would have to put some considerable effort into making sure you tweak the memory and CPU consumption. Turning off services, tweaking MySQL buffers, and etc. Typically the output of time and energy isn’t worth it for me.

I’d simply upgrade to a 2 CPU Core and 2GB or 4GB of memory instance. This leaves plenty of room for system services and MySQL/Redis/Memcache for single or multiple sites. But now we’re getting into instance sizing which is another whole conversation.

Review your Websites Configuration and Resources Consumption

You can go to each site on the server or just the single site and review the following.

There are four situations to review.

  1. Illegitimate traffic.
  2. Bad code on an action
  3. Bad code on a timed event
  4. A legitimate request.

Let’s go through them.

Illegitimate traffic.

Legitimate or malicious enabling the security features at the server level is going to save you resources. Specifically you want to look at the following.

  • Throttle or Block XML-RPC
  • Throttle or block logins (fail2ban).
  • There’s more, but review your access logs to see.

Bad code on an action

When any action in WordPress is triggered such as a page or post update. A plugin might be doing something that takes up resources each time an action occurs. For logged-in users, you can test with the Query Log plugin. But you have to replicate the action yourself. You can see where there are long query times, and outside API’s that might be slowing down requests. As well as which hooks or functions are triggered when they shouldn’t be.

If it’s a plugin causing the issue, you can reach out to the developer with the information.

Bad code on a timed event

Timed events (check plugin updates or build a search index) you will need something like Newrelic. It will catch all PHP actions for your site. You can then review each transaction and figure out the culprit. There are other APM’s (Application Performance Monitoring) available such as Zoho’s Site 24×7 just to name one.

A legitimate request.

A majority of legitimate requests should be served from cache, so set up full page caching or find out why the request isn’t being cached. You also have the option of object cache as well. Now we’re getting into a whole other subject for cache and performance improvements.

Other Potential Causes for High Resource Usage

Here’s a couple of other suggestions.

  • It could also be that your cronjob is broken.
  • Transients aren’t being saved properly.
  • Misconfiguration of your WordPress site.
  • Custom mu-plugins causing issues.
  • Debug or Verbose logging or script actions. (Elementor Debug mode)
  • System tasks run via the cronjob outside of WordPress.
  • Backups jobs running.

Conclusion

So what can you take away from this?

Short Term. If you want to throw money at the problem, upgrade your server resources in small increments. Monitoring and review to see if the problem goes away.

Long term. You need to be aware of your server resource usage and applying best practices to your websites to ensure that bad actors, illegitimate traffic, and bad plugins aren’t causing resource spikes.

I’m hoping to provide more articles on this subject, specifically tracking resources and identifying resource issues. Stay tuned!

0 Shares: