A troubleshooting performance related issue in the IT world is always challenging, and if you were not aware of the right tools, then it would be frustrating.
If you are working as a support in a production environment then most probably you will need to deal with performance-related issues in the Linux environment. Let’s go through some of the most used Linux command-line utilities to diagnose performance-related issues.
Note: Some of the commands listed below may not be installed by default, so you got to install them manually.
lsof
lsof stands for “list open files” to help you to find all the opened files and processes along with the one who opened them. The lsof utility can be convenient to use in some scenarios.
To list, all the files opened by a particular PID.
# lsof –p PID
Count number of files & processes
[root@localhost ~]# lsof -p 4271 | wc -l 34 [root@localhost ~]#
Check the currently opened log file
# lsof –p | grep log
Find out the port number used by the process
lsof -i -P |grep $PID
[root@localhost ~]# lsof -i -P |grep 4271 nginx 4271 root 6u IPv4 51306 0t0 TCP *:80 (LISTEN) nginx 4271 root 7u IPv4 51307 0t0 TCP *:443 (LISTEN) [root@localhost ~]#
Check out more lsof command examples.
pidstat
pidstat can be used to monitor tasks managed by the Linux kernel. Troubleshooting I/O related issues can be easy with this command.
List I/O statistics of all the PID
# pidstat –d
To displace I/O stats for particular PID
# pidstat –p 4271 –d
If you are doing real-time troubleshooting for some process, then you can monitor the I/O in an interval. The below example is to monitor every 5 seconds.
[root@localhost ~]# pidstat -p 4362 -d 5 Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain) 08/13/2016 _x86_64_ (2 CPU) 07:01:30 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s Command 07:01:35 PM 0 4362 0.00 0.00 0.00 nginx 07:01:40 PM 0 4362 0.00 0.00 0.00 nginx 07:01:45 PM 0 4362 0.00 0.00 0.00 nginx 07:01:50 PM 0 4362 0.00 0.00 0.00 nginx
top
Probably one of the most used commands on Linux would be top. The top command can be used to display system summary information and current utilization.
Just executing the top command can show you CPU utilization, process details, a number of tasks, memory utilization, a number of zombie processes, etc.
top - 11:48:43 up 13 days, 17:25, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 1829.7 total, 388.1 free, 220.3 used, 1221.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1369.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 186820 13400 9092 S 0.0 0.7 0:10.87 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.13 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
9 root 20 0 0 0 0 S 0.0 0.0 0:07.35 ksoftirqd/0
10 root 20 0 0 0 0 R 0.0 0.0 0:07.30 rcu_sched
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
12 root rt 0 0 0 0 S 0.0 0.0 0:00.50 watchdog/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
16 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
17 root 20 0 0 0 0 S 0.0 0.0 0:00.68 kauditd
18 root 20 0 0 0 0 S 0.0 0.0 0:00.25 khungtaskd
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 oom_reaper
20 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 writeback
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kcompactd0
22 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
23 root 39 19 0 0 0 S 0.0 0.0 0:05.63 khugepaged
24 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 crypto
25 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kintegrityd
26 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kblockd
To display process details for specific user
# top –u username
To kill the process, you can execute the top and press k
. It will prompt you to enter the PID to be killed.
top - 11:49:39 up 13 days, 17:26, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 91 total, 1 running, 90 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 1829.7 total, 386.9 free, 221.4 used, 1221.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1368.3 avail Mem
PID to signal/kill [default pid = 21261]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5834 nginx 20 0 148712 7428 4800 S 0.0 0.4 0:02.37 nginx
ps
ps stands for process status and widely used command to get a snapshot of the running process. Very useful to find out if a process is running or not and if running then prints PID.
To find out the PID and process details by some word
[root@lab ~]# ps -ef|grep nginx
root 5833 1 0 May24 ? 00:00:00 nginx: master process /usr/sbin/nginx
nginx 5834 5833 0 May24 ? 00:00:02 nginx: worker process
root 21267 18864 0 11:50 pts/0 00:00:00 grep --color=auto nginx
[root@lab ~]#
tcpdump
Troubleshooting network issue is always challenging, and one of the essential commands to use is tcpdump.
You can use tcpdump to capture the network packets on a network interface.
To capture the packets on a particular network interface
[root@lab ~]# tcpdump -i eth0 -w /tmp/capture
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C9 packets captured
16 packets received by filter
0 packets dropped by kernel
[root@lab ~]#
As you can see above has captured the traffic flow on eth0
interface.
To capture network traffic between source and destination IP
# tcpdump src $IP and dst host $IP
Capture network traffic for destination port 443
# tcpdump dst port 443 tcpdump: data link type PKTAP tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on pktap, link-type PKTAP (Packet Tap), capture size 262144 bytes 12:02:30.833845 IP 192.168.1.2.49950 > ec2-107-22-185-206.compute-1.amazonaws.com.https: Flags [.], ack 421458229, win 4096, length 0 12:02:32.076893 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [S], seq 21510813, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 353259990 ecr 0,sackOK,eol], length 0 12:02:32.090389 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [.], ack 790725431, win 8192, length 0 12:02:32.090630 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [P.], seq 0:517, ack 1, win 8192, length 517 12:02:32.109903 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [.], ack 147, win 8187, length 0
Read the captured file
# tcpdump –r filename
Ex: to read an above-captured file
# tcpdump –r /tmp/test
Learn more about tcpdump to capture and analyze the network traffic.
iostat
iostat stands for input-output statistics and often used to diagnose a performance issue with storage devices. You can monitor CPU, Device & Network file system utilization report with iostat.
Display disk I/O statistics
[root@localhost ~]# iostat -d Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain) 08/13/2016 _x86_64_ (2 CPU) Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.82 55.81 12.63 687405 155546 [root@localhost ~]#
Display CPU statistics
[root@localhost ~]# iostat -c Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain) 08/13/2016 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.59 0.02 0.33 0.54 0.00 98.52 [root@localhost ~]#
ldd
ldd stands for list dynamic dependencies to show shared libraries needed by the library. The ldd
command can be handy to diagnose the application startup problem.
If some program is not starting due to dependencies not available then you can ldd to find out the shared libraries it’s looking for.
[root@localhost sbin]# ldd httpd linux-vdso.so.1 => (0x00007ffe7ebb2000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fa4d451e000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fa4d42f9000) libaprutil-1.so.0 => /lib64/libaprutil-1.so.0 (0x00007fa4d40cf000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fa4d3e98000) libexpat.so.1 => /lib64/libexpat.so.1 (0x00007fa4d3c6e000) libdb-5.3.so => /lib64/libdb-5.3.so (0x00007fa4d38af000) libapr-1.so.0 => /lib64/libapr-1.so.0 (0x00007fa4d3680000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa4d3464000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa4d325f000) libc.so.6 => /lib64/libc.so.6 (0x00007fa4d2e9e000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fa4d2c79000) /lib64/ld-linux-x86-64.so.2 (0x00007fa4d4a10000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fa4d2a73000) libfreebl3.so => /lib64/libfreebl3.so (0x00007fa4d2870000) [root@localhost sbin]#
netstat
netstat (Network Statistics) is a popular command to print network connections, interface statistics, and to troubleshoot various network-related issue.
To show stats of all protocols
# netstat –s
You can use grep to find out if any errors
[root@localhost sbin]# netstat -s | grep error 0 packet receive errors 0 receive buffer errors 0 send buffer errors [root@localhost sbin]#
To show the kernel routing table
[root@localhost sbin]# netstat -r Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface default gateway 0.0.0.0 UG 0 0 0 eno16777736 172.16.179.0 0.0.0.0 255.255.255.0 U 0 0 0 eno16777736 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 [root@localhost sbin]#
Explore more netstat command examples.
free
If your Linux server is running out of memory or just want to find out how much memory available out of available memory, then the free command will help you.
[root@localhost sbin]# free -g total used free shared buff/cache available Mem: 5 0 3 0 1 4 Swap: 5 0 5 [root@localhost sbin]#
-g
means to show the details in GB. So as you can see total available memory is 5 GB and 3 GB is free.
sar
sar (System Activity Report) will be helpful to collect a number of a report including CPU, Memory, and device load.
By just executing sar
command will show you system utilization for the entire day.
By default, it stores utilization report in 10 minutes. If you need something shorter in real-time, you can use it as below.
Show CPU report for 3 times every 3 seconds
[root@localhost sbin]# sar 3 2 Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain) 08/13/2016 _x86_64_ (2 CPU) 11:14:02 PM CPU %user %nice %system %iowait %steal %idle 11:14:05 PM all 1.83 0.00 0.50 0.17 0.00 97.51 11:14:08 PM all 1.50 0.00 0.17 0.00 0.00 98.33 Average: all 1.67 0.00 0.33 0.08 0.00 97.92 [root@localhost sbin]#
Show Memory usage report
# sar –r
Show network report
# sar –n ALL
ipcs
ipcs (InterProcess Communication System) provides a report on the semaphore, shared memory & message queue.
To list the message queue
# ipcs –q
To list the semaphores
# ipcs –s
To list the shared memory
# ipcs –m
To display the current usage status of IPC
[root@localhost sbin]# ipcs -u ------ Messages Status -------- allocated queues = 0 used headers = 0 used space = 0 bytes ------ Shared Memory Status -------- segments allocated 5 pages allocated 2784 pages resident 359 pages swapped 0 Swap performance: 0 attempts 0 successes ------ Semaphore Status -------- used arrays = 0 allocated semaphores = 0 [root@localhost sbin]#
ioping
ioping is an external command you can install it from here. It can be very handy to monitor the disk I/O latency in real-time.
Conclusion
I hope the above commands help in the various situation at your system administration job. The above-mentioned commands are good to use on-demand. However, if you need to monitor Linux servers all the time then you should consider using server monitoring software.
And, to learn more about Linux performance, you can check out this Udemy course.