| Why Monitor Your System? |
|
Proactive Monitoring
predicting the future such awareness of storage situation or CPU usage
Reactive Monitoring
explaining previous events such as why a crash occurred
|
| System Resources Monitoring: top, htop, vmstat, iostat |
|
Load Average
in top's top right corner load averages are displayedthree numbers are displayed for last 1, 5 and 15 minutes load number of processes waiting for the CPU
vmstat (Virtual Memory Statistics
top gives a snapshotvmstate provides a trend vmstat 1updates every second columns to watch
iostat (Input/Output Statistics
app focuses on hard drives
iostat -xz 1if %util is near 100% the drive is a bottleneck |
| Memory Usage: free, Understanding RAM and Swap |
|
perfomance drops when system swaps between RAM and disk the free command free -houtput total used free shared buff/cache available Mem: 15Gi 4.0Gi 8.0Gi 1.0Gi 3.0Gi 9.0Gi Swap: 2.0Gi 0B 2.0Gicolumns
Swap Memory
swap is space on drive used as emergency RAMif swap is high server has run out of available RAM server will be sluggish and unresponsive |
| Disk I/O Monitoring |
|
want to know what app is writing to disk iotop works like top but for disks sudo iotoplists processes sorted by how much they are reading/writing if system feels slow but CPU usage is low, run iotop might find a backup script or a database query thrashing the drive |
| Understanding System Logs: /var/log/ |
|
The Big Two
Linux logs everything to the /var/log directoryon Ubuntu/Debian systems the file is /var/log/syslogon Red Hat/CentOS system the file is /var/log/messages Authentication Log
to view logins and login attempts
Application Log
most major applications keep their own subdirectories
|
| Key Log Files: syslog, auth.log, kern.log, dmesg |
|
syslog (System Log)
a catch allformat Date Time Hostname Process[PID]: Messageexample Oct 30 10:00:01 server cron[123]: (root) CMD (backup.sh) auth.log (Security Log)
tracks sudo usage, SSH logins and user creationexample Oct 30 10:05:00 server sshd[456]: Failed password for invalid user admin from 192.168.1.50 kern.log (Kernel Log)
messages directly from the kernel
dmesg (Boot Log)
command prints the 'ring buffer' of the kernelshows exactly what happened during the boot process (detecting hard drives, loading drivers) dmesg | lessor to show human-readable timestamps dmesg -T |
| Real-time Log Monitoring: tail -f |
|
tail -f is the primary tool for monitoring covered in Reading File Contents: cat, less, more, head, tail scenario - restart web server but it fails to start
|
| Log Rotation and Management: logrotate |
|
obvious problem with infinite logging and disk space logrotate runs daily via Cron checks the logs if log is too big or too old tool will
|
| Setting Up Simple Alerts and Notifications |
|
set up simple alerts using shell scripts and cron
Example: The Disk Space Alerter
create a script check_disk.sh
#!/bin/bash
USAGE=$(df / | grep / | awk '{ print $5 }' | sed 's/%//g')
THRESHOLD=90
if [ "$USAGE" -gt "$THRESHOLD" ]; then
echo "Warning: Disk usage is at ${USAGE}%" | mail -s "Disk Alert" [email protected]
fi
add script to crontab to run every hour
|
| Summary |
covers
key points
|