| Developing a Troubleshooting Mindset |
four steps
- Identify the Symptom - narrow down the suspect
- Isolate the Variable - change one thing at a time
- Check the Evidence - read the logs
- Test the Fix - once resolution is believed reboot, does it stay fixed?
|
| Boot Problems: GRUB and Recovery Mode |
grub rescue>
bootloader is corrupted or cannot find the partition with Linux kernel
-
Cause - updates can overwrite GRUB when dual booting Windows
-
Fix - boot from Linux installer USB stick
open a terminal
install a tool called boot-repair
run boot-repair
app will scan drive, find the Linux partion, and reinstall GRUB automatically
see "Welcome to emergency mode!" and are asked for the root password
-
Cause - usually a filesystem error or bad entry in /etc/fstab
if a new drive is added to fstab then unplugged, Linux will refuse to boot because a required disk is missing
-
Fix
- enter the root password
-
remount root filesystem as read/write
mount -o remount, rw /
-
edit fstab
nano etc/fstab
- comment bad line
- save and reboot
forgot password and can't login
- reboot
- at the GRUB menu press e to edit the boot options
- find line starting with linux
-
at the end of line add
init=/bin/bash
- press Ctrl+X or F10 to reboot
- will drop into a root shell without a password
-
remount the drive
mount -o remount,rw /
-
change password
passwd <username>
-
reboot
exec /sbin/init
|
| System Performance Issues: Identifying Bottlenecks |
system is running slow
-
Check Load - run uptime
compare load average to number of cores
-
Check CPU - run top
is a process hogging the CPU?
yes - kill the runaway process
no -CPU is idle but sysstem is slow
check I/O
-
Check I/O - run top or iostat
check wa (wait) percentage
if ws is high CPU is waiting on disk
run iotop to find the culprit
-
Check RAM - run free -h
if swap usage is high and free RAM is near zero, the system is thrashing
close some apps or buy more RAM
|
| Network Connectivity Problems |
can't connect to the Internet
-
Layer 1 Check - check by running
ip link
status will be UP or DOWN
-
IP Check - check IP address
ip addr
no IP address means DHCP client failure
request a new one
sudo dhclient -v
-
Gateway Check - check router
ip route
ping the IP address returned
failure means LAN failure
-
Internet Check - ping Google's domain
ping 8.8.8.8
failure means router not connected to ISP
-
DNS Check - ping Google
ping google.com
failure may mean DNS settings are wrong
experience says DNS services sometimes can briefly go down (problem not local)
|
| Package and Dependency Conflicts |
"E: Unable to locate package" or "Held broken packages."
-
update the cache
sudo apt update
-
fix broken installation
first run
sudo dpkg --configure -a
then
sudo apt install -f
-
fixed locked files
"Could not get lock /var/lib/dpkg/lock"
means another package manager is running
wait 5 minutes
if problem persists run
ps aux | grep apt
kill the stuck process
delete the lockfile
sudo rm /var/lib/dpkg/lock-frontend
|
| Permission Denied Errors |
common error
-
check ownership
ls -l
does user running app own the file?
use chown to change ownership
-
check permissions
are permissions rw-?
use chmod to change x bit
-
check directories
to enter a folder execute permission is needed
-
check AppArmor/SELinux
if permissions are OK (rwxrwxrwx) but still denied MAC (Madatory Access Control) issues
check
dmesg or /var/log/audit/audit.log
|
| Disk Space Issues: Finding and Removing Large Files |
"No space left on disk"
-
verify
run
df -h
confirm which partition is full
-
locate
go to root of partition and run
sudo du -sh|sort -h
lists directories by size
cd to largest-sized folder
cd /<folder>
again run
sudo du -sh|sort -h
cd to largest-sized subfolder
cd /<folder>
examine folder contents
ls -lh
-
fix
generally large file will be a log file
delete the file
sudo rm /var/<folder>/<filename>
if file is in use space will not be freed until app restarts
restart the log's service
|
| Service Failures and Log Analysis |
"Service failed to start"
-
check status
systemctl status apache2
read error lines
-
check journal
journalctl -u apache2 -e
specific error usually at end
Syntax error on line 54 of /etc/apache2/apache2.conf
- fix config - correct the error
-
restart
systemctl restart apache2
|
| Summary |
covered
- boot - Rescue Mode and chroot to fix broken systems
- performance - identifying bottlenecks
- networks - tracking packets
- permissions - ownership, bits and Mandatory Access Control
- logs - use journalctl to find problems
key points
- don't panic - follow logical path
- journalctl -xe - first command to run when a service fails
- df -h / du -sh - tools for disk-space issues
- ping / ip route / dig - network tools
- top / iotop - identifying bottlenecks
- Single User Mode - using GRUB to reset root passwords
- logs - the answer is almost always in the logs
|