A troubleshooting performance related issue in the IT world is always challenging, and if you were not aware of the right tools, then it would be frustrating.

If you are working as a support in a production environment then most probably you will need to deal with performance-related issues in the Linux environment. Let’s go through some of the most used Linux command-line utilities to diagnose performance-related issues.

Note: Some of the commands listed below may not be installed by default, so you got to install them manually.

lsof

lsof stands for “list open files” to help you to find all the opened files and processes along with the one who opened them. The lsof utility can be convenient to use in some scenarios.

To list, all the files opened by a particular PID.

# lsof –p PID

Count number of files & processes

[root@localhost ~]# lsof -p 4271 | wc -l
34
[root@localhost ~]#

Check the currently opened log file

# lsof –p | grep log

Find out the port number used by the process

lsof -i -P |grep $PID
[root@localhost ~]# lsof -i -P |grep 4271

nginx     4271   root   6u IPv4 51306     0t0 TCP *:80 (LISTEN)

nginx     4271   root   7u IPv4 51307     0t0 TCP *:443 (LISTEN)

[root@localhost ~]#

Check out more lsof command examples.

pidstat

pidstat can be used to monitor tasks managed by the Linux kernel. Troubleshooting I/O related issues can be easy with this command.

List I/O statistics of all the PID

# pidstat –d

To displace I/O stats for particular PID

# pidstat –p 4271 –d

If you are doing real-time troubleshooting for some process, then you can monitor the I/O in an interval. The below example is to monitor every 5 seconds.

[root@localhost ~]# pidstat -p 4362 -d 5

Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain)          08/13/2016             _x86_64_         (2 CPU) 

07:01:30 PM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s Command

07:01:35 PM     0     4362     0.00     0.00     0.00 nginx

07:01:40 PM     0     4362     0.00     0.00     0.00 nginx

07:01:45 PM     0     4362     0.00     0.00     0.00 nginx

07:01:50 PM     0     4362     0.00     0.00     0.00 nginx

top

Probably one of the most used commands on Linux would be top. The top command can be used to display system summary information and current utilization.

Just executing the top command can show you CPU utilization, process details, a number of tasks, memory utilization, a number of zombie processes, etc.

top - 11:48:43 up 13 days, 17:25,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  90 total,   2 running,  88 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1829.7 total,    388.1 free,    220.3 used,   1221.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1369.4 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                              
    1 root      20   0  186820  13400   9092 S   0.0   0.7   0:10.87 systemd                                                                                                              
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.13 kthreadd                                                                                                             
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                               
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                           
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H                                                                                                         
    8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                         
    9 root      20   0       0      0      0 S   0.0   0.0   0:07.35 ksoftirqd/0                                                                                                          
   10 root      20   0       0      0      0 R   0.0   0.0   0:07.30 rcu_sched                                                                                                            
   11 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                                                          
   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.50 watchdog/0                                                                                                           
   13 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                                                                                                              
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs                                                                                                            
   16 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns                                                                                                                
   17 root      20   0       0      0      0 S   0.0   0.0   0:00.68 kauditd                                                                                                              
   18 root      20   0       0      0      0 S   0.0   0.0   0:00.25 khungtaskd                                                                                                           
   19 root      20   0       0      0      0 S   0.0   0.0   0:00.00 oom_reaper                                                                                                           
   20 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 writeback                                                                                                            
   21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kcompactd0                                                                                                           
   22 root      25   5       0      0      0 S   0.0   0.0   0:00.00 ksmd                                                                                                                 
   23 root      39  19       0      0      0 S   0.0   0.0   0:05.63 khugepaged                                                                                                           
   24 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 crypto                                                                                                               
   25 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kintegrityd                                                                                                          
   26 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kblockd

To display process details for specific user

# top –u username

To kill the process, you can execute the top and press k. It will prompt you to enter the PID to be killed.

top - 11:49:39 up 13 days, 17:26,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  91 total,   1 running,  90 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1829.7 total,    386.9 free,    221.4 used,   1221.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1368.3 avail Mem 
PID to signal/kill [default pid = 21261] 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                              
 5834 nginx     20   0  148712   7428   4800 S   0.0   0.4   0:02.37 nginx

ps

ps stands for process status and widely used command to get a snapshot of the running process. Very useful to find out if a process is running or not and if running then prints PID.

To find out the PID and process details by some word

[root@lab ~]# ps -ef|grep nginx
root      5833     1  0 May24 ?        00:00:00 nginx: master process /usr/sbin/nginx
nginx     5834  5833  0 May24 ?        00:00:02 nginx: worker process
root     21267 18864  0 11:50 pts/0    00:00:00 grep --color=auto nginx
[root@lab ~]#

tcpdump

Troubleshooting network issue is always challenging, and one of the essential commands to use is tcpdump.

You can use tcpdump to capture the network packets on a network interface.

To capture the packets on a particular network interface

[root@lab ~]# tcpdump -i eth0 -w /tmp/capture
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C9 packets captured
16 packets received by filter
0 packets dropped by kernel
[root@lab ~]#

As you can see above has captured the traffic flow on eth0 interface.

To capture network traffic between source and destination IP

# tcpdump src $IP and dst host $IP

Capture network traffic for destination port 443

# tcpdump dst port 443
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Packet Tap), capture size 262144 bytes
12:02:30.833845 IP 192.168.1.2.49950 > ec2-107-22-185-206.compute-1.amazonaws.com.https: Flags [.], ack 421458229, win 4096, length 0
12:02:32.076893 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [S], seq 21510813, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 353259990 ecr 0,sackOK,eol], length 0
12:02:32.090389 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [.], ack 790725431, win 8192, length 0
12:02:32.090630 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [P.], seq 0:517, ack 1, win 8192, length 517
12:02:32.109903 IP 192.168.1.2.49953 > 104.25.133.107.https: Flags [.], ack 147, win 8187, length 0

Read the captured file

# tcpdump –r filename

Ex: to read an above-captured file

# tcpdump –r /tmp/test

Learn more about tcpdump to capture and analyze the network traffic.

iostat

iostat stands for input-output statistics and often used to diagnose a performance issue with storage devices. You can monitor CPU, Device & Network file system utilization report with iostat.

Display disk I/O statistics

[root@localhost ~]# iostat -d
Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain)          08/13/2016             _x86_64_         (2 CPU)
Device:           tps   kB_read/s   kB_wrtn/s   kB_read   kB_wrtn
sda               1.82       55.81       12.63     687405     155546
[root@localhost ~]#

Display CPU statistics

[root@localhost ~]# iostat -c
Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain)          08/13/2016             _x86_64_         (2 CPU)
avg-cpu: %user   %nice %system %iowait %steal   %idle
           0.59   0.02   0.33   0.54   0.00   98.52
[root@localhost ~]#

ldd

ldd stands for list dynamic dependencies to show shared libraries needed by the library. The ldd command can be handy to diagnose the application startup problem.

If some program is not starting due to dependencies not available then you can ldd to find out the shared libraries it’s looking for.

[root@localhost sbin]# ldd httpd
            linux-vdso.so.1 => (0x00007ffe7ebb2000)
            libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fa4d451e000)
            libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fa4d42f9000)
            libaprutil-1.so.0 => /lib64/libaprutil-1.so.0 (0x00007fa4d40cf000)
            libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fa4d3e98000)
            libexpat.so.1 => /lib64/libexpat.so.1 (0x00007fa4d3c6e000)
            libdb-5.3.so => /lib64/libdb-5.3.so (0x00007fa4d38af000)
            libapr-1.so.0 => /lib64/libapr-1.so.0 (0x00007fa4d3680000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa4d3464000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007fa4d325f000)
            libc.so.6 => /lib64/libc.so.6 (0x00007fa4d2e9e000)
            liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fa4d2c79000)
            /lib64/ld-linux-x86-64.so.2 (0x00007fa4d4a10000)
            libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fa4d2a73000)
            libfreebl3.so => /lib64/libfreebl3.so (0x00007fa4d2870000)
[root@localhost sbin]#

netstat

netstat (Network Statistics) is a popular command to print network connections, interface statistics, and to troubleshoot various network-related issue.

To show stats of all protocols

# netstat –s

You can use grep to find out if any errors

[root@localhost sbin]# netstat -s | grep error
   0 packet receive errors
   0 receive buffer errors
   0 send buffer errors
[root@localhost sbin]#

To show the kernel routing table

[root@localhost sbin]# netstat -r
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window irtt Iface
default         gateway         0.0.0.0         UG       0 0         0 eno16777736
172.16.179.0   0.0.0.0         255.255.255.0   U         0 0         0 eno16777736
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0         0 virbr0
[root@localhost sbin]#

Explore more netstat command examples.

free

If your Linux server is running out of memory or just want to find out how much memory available out of available memory, then the free command will help you.

[root@localhost sbin]# free -g
             total       used       free     shared buff/cache   available
Mem:             5           0           3           0           1           4
Swap:             5           0           5
[root@localhost sbin]#

-g means to show the details in GB. So as you can see total available memory is 5 GB and 3 GB is free.

sar

sar (System Activity Report) will be helpful to collect a number of a report including CPU, Memory, and device load.

By just executing sar command will show you system utilization for the entire day.

sar-output

By default, it stores utilization report in 10 minutes. If you need something shorter in real-time, you can use it as below.

Show CPU report for 3 times every 3 seconds

[root@localhost sbin]# sar 3 2
Linux 3.10.0-327.13.1.el7.x86_64 (localhost.localdomain)          08/13/2016             _x86_64_         (2 CPU)
11:14:02 PM     CPU     %user     %nice   %system   %iowait   %steal     %idle
11:14:05 PM     all     1.83     0.00     0.50     0.17     0.00     97.51
11:14:08 PM     all     1.50     0.00      0.17     0.00     0.00     98.33
Average:       all     1.67     0.00     0.33     0.08     0.00     97.92
[root@localhost sbin]#

Show Memory usage report

# sar –r

Show network report

# sar –n ALL

ipcs

ipcs (InterProcess Communication System) provides a report on the semaphore, shared memory & message queue.

To list the message queue

# ipcs –q

To list the semaphores

# ipcs –s

To list the shared memory

# ipcs –m

To display the current usage status of IPC

[root@localhost sbin]# ipcs -u

------ Messages Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

------ Shared Memory Status --------
segments allocated 5
pages allocated 2784
pages resident 359
pages swapped   0
Swap performance: 0 attempts       0 successes

------ Semaphore Status --------
used arrays = 0
allocated semaphores = 0
[root@localhost sbin]#

ioping

ioping is an external command you can install it from here. It can be very handy to monitor the disk I/O latency in real-time.

Conclusion

I hope the above commands help in the various situation at your system administration job. The above-mentioned commands are good to use on-demand. However, if you need to monitor Linux servers all the time then you should consider using server monitoring software.

And, to learn more about Linux performance, you can check out this Udemy course.