[TAG] Tracking load average issues
Ben Okopnik
ben at linuxgazette.net
Fri Jul 27 22:10:00 MSD 2007
On Fri, Jul 27, 2007 at 08:40:28PM +0530, Raj Shekhar wrote:
> in infinite wisdom Neil Youngman spoke thus On 07/18/2007 01:57 PM:
>
> > I googled for "load average", "high load average" and "diagnose load average"
> > and I found very little of use. the one thing I found was that if it's
> > processes stuck waiting on I/O "ps ax" should show processes in state "D".
> > There are none visible on this box.
>
> I doubt you can fix it without any monitoring. There are lots of light
> weight monitoring scripts that you can use (I use nagios, and I cannot
> remember any names for light weight scripts from the top of my head.
> Nagios would be an overkill for you). In generic terms, what you need
> to do is
> - install a monitoring script
> - all monitoring systems have hooks that allow you to insert your own
> monitoring scripts
> - monitor for system load - the bash oneliner should give you the
> system load for the past 1 minute
> "
> uptime |perl -lane 'if (m/.+ (load average: )(.+), (.+), (.+)/) {print $2}'
> "
> - when the number goes above UPPER-LIMIT, then do a 'ps auxww >>
> LOG_FILE' from your monitoring script itself (where UPPER-LIMIT and
> LOG_FILE are your user supplied values)
You could easily combine the two tasks and automate them, perhaps by
running a cron job. Or, you could wrap a '{ ...; sleep 10; redo }'
construct around the statements below to get snapshots at 10-second
intervals.
``
perl -we'`uptime`=~/average: ([^,]+)/;system "ps auxww>>LOG_FILE" if $1>$LIMIT'
''
Obviously, you'll want to modify the name of the log file and set $LIMIT
(or use an explicit value.)
> - study the log file deeply to find a pattern.
I think that this is the part that Neil was asking about, really. :)
> - fix the problem
Once the above is done, that's most likely trivial.
--
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
More information about the TAG
mailing list