Diagnosing High Load Average on Linux: A Troubleshooting War Story
“The server’s slow and the load average is 40.” It’s one of the most common calls we get, and the cause is almost never what people assume. Here’s the methodical process we use to find what’s really going on.
Load average isn’t CPU usage
A common misconception: load average measures CPU. It doesn’t. On Linux it counts processes that are runnable or in uninterruptible sleep — and that second category is the key. A high load with low CPU usually means processes are blocked waiting on disk or the network.
The commands we reach for
# Quick overview — is it CPU, memory or I/O?
top
vmstat 1 5
# Per-disk I/O — are we waiting on storage?
iostat -xz 1
# Which processes are in uninterruptible sleep (state D)?
ps aux | awk '$8 ~ /D/ { print }'
What we actually found
CPU was idle. iostat showed disk utilisation pinned at 100% with huge await times. The culprit: a nightly backup job had overlapped with a database export, saturating a slow disk. Dozens of processes were stuck waiting for I/O, inflating the load average even though nothing was computing.
The fix
- Staggered the backup and export schedules so they no longer collided
- Moved the database’s temporary files to faster storage
- Added I/O-wait alerting so we’d catch it proactively next time
The takeaway: don’t panic at a big load number. Work the evidence — CPU, memory, then I/O — and let the data point you to the bottleneck instead of guessing.
Need this handled for you?
Server Wizards looks after Linux infrastructure so you don’t have to — proactively, and around the clock.
#high load average #iostat #linux troubleshooting #performance
Need a hand with your servers?
We manage, secure and monitor Linux infrastructure so you don't have to.
