One of the frequently used utilities by sysadmin is wget. It can be very handy during web-related troubleshooting.
What is wget command?
wget command is a popular Unix/Linux command-line utility for fetching the content from the web. It is free to use and provides a non-interactive way to download files from the web. The wget command supports HTTPS, HTTP, and FTP protocols out of the box. Moreover, you can also use HTTP proxies with it.
How does wget help you troubleshoot?
There are many ways.
As a sysadmin, most of the time, you’ll be working on a terminal, and when troubleshooting web application related issues, you may not want to check the entire page but just the connectivity. Or, you want to verify intranet websites. Or, you want to download a certain page to verify the content.
wget is non-interactive, which means that you can run it in the background even when you are logged off. There can be many instances where it is essential for you to disconnect from the system even when doing file retrieval from the web. In the background, the wget
will run and finish their assigned job.
It can also be used to get the entire website on your local machines. It can follow links in XHTML and HTML pages to create a local version. To do so, it has to download the page recursively. This is very useful as you can use it to download important pages or sites for offline viewing.
Let’s see them in action. The syntax of the wget is as below.
wget [option] [URL]
Download a webpage
Let’s try to download a page. Ex: github.com
wget github.com
If connectivity is fine, then it will download the homepage and show the output as below.
root@trends:~# wget github.com URL transformed to HTTPS due to an HSTS policy --2020-02-23 10:45:52-- https://github.com/ Resolving github.com (github.com)... 140.82.118.3 Connecting to github.com (github.com)|140.82.118.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html’ index.html [ <=> ] 131.96K --.-KB/s in 0.04s 2020-02-23 10:45:52 (2.89 MB/s) - ‘index.html’ saved [135126] root@trends:~#
Download multiple files
Handy when you have to download multiple files at once. This can give you an idea about automating files download through some scripts.
Let’s try to download Python 3.8.1 and 3.5.1 files.
wget https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz
So, as you can guess, the syntax is as below.
wget URL1 URL2 URL3
You just have to ensure giving space between URLs.
Limit download speed
It would be useful when you want to check how much time your file takes to download at different bandwidth.
Using the --limit-rate
option, you can limit the download speed.
Here is the output of downloading the Nodejs file.
root@trends:~# wget https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
--2020-02-23 10:59:58-- https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
Resolving nodejs.org (nodejs.org)... 104.20.23.46, 104.20.22.46, 2606:4700:10::6814:162e, ...
Connecting to nodejs.org (nodejs.org)|104.20.23.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14591852 (14M) [application/x-xz]
Saving to: ‘node-v12.16.1-linux-x64.tar.xz’
node-v12.16.1-linux-x64.tar.xz 100%[===========================================================================================>] 13.92M --.-KB/s in 0.05s
2020-02-23 10:59:58 (272 MB/s) - ‘node-v12.16.1-linux-x64.tar.xz’ saved [14591852/14591852]
It took 0.05 seconds to download 13.92 MB files. Now, let’s try to limit the speed to 500K.
root@trends:~# wget --limit-rate=500k https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
--2020-02-23 11:00:18-- https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
Resolving nodejs.org (nodejs.org)... 104.20.23.46, 104.20.22.46, 2606:4700:10::6814:162e, ...
Connecting to nodejs.org (nodejs.org)|104.20.23.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14591852 (14M) [application/x-xz]
Saving to: ‘node-v12.16.1-linux-x64.tar.xz.1’
node-v12.16.1-linux-x64.tar.xz.1 100%[===========================================================================================>] 13.92M 501KB/s in 28s
2020-02-23 11:00:46 (500 KB/s) - ‘node-v12.16.1-linux-x64.tar.xz.1’ saved [14591852/14591852]
Reducing the bandwidth took longer to download – 28 seconds. Imagine, your users are complaining about slow download, and you know their network bandwidth is low. You can quickly try --limit-rate
to simulate the issue.
Download in the background
Downloading large files can take the time or the above example where you want to set the rate limit as well. This is expected, but what if you don’t want to stare at your terminal?
Well, you can use -b
argument to start the wget in the background.
root@trends:~# wget -b https://slack.com
Continuing in background, pid 25430.
Output will be written to ‘wget-log.1’.
root@trends:~#
Ignore Certificate Error
This is handy when you need to check intranet web applications that don’t have the proper certificate. By default, wget will throw an error when a certificate is not valid.
root@trends:~# wget https://expired.badssl.com/
--2020-02-23 11:24:59-- https://expired.badssl.com/
Resolving expired.badssl.com (expired.badssl.com)... 104.154.89.105
Connecting to expired.badssl.com (expired.badssl.com)|104.154.89.105|:443... connected.
ERROR: cannot verify expired.badssl.com's certificate, issued by ‘CN=COMODO RSA Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB’:
Issued certificate has expired.
To connect to expired.badssl.com insecurely, use `--no-check-certificate'.
The above example is for the URL where cert is expired. As you can see it has suggested using --no-check-certificate
which will ignore any cert validation.
root@trends:~# wget https://untrusted-root.badssl.com/ --no-check-certificate
--2020-02-23 11:33:45-- https://untrusted-root.badssl.com/
Resolving untrusted-root.badssl.com (untrusted-root.badssl.com)... 104.154.89.105
Connecting to untrusted-root.badssl.com (untrusted-root.badssl.com)|104.154.89.105|:443... connected.
WARNING: cannot verify untrusted-root.badssl.com's certificate, issued by ‘CN=BadSSL Untrusted Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US’:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 600 [text/html]
Saving to: ‘index.html.6’
index.html.6 100%[===========================================================================================>] 600 --.-KB/s in 0s
2020-02-23 11:33:45 (122 MB/s) - ‘index.html.6’ saved [600/600]
root@trends:~#
Cool, isn’t it?
HTTP Response Header
See the HTTP response header of a given site on the terminal.
Using -S
will print the header, as you can see below for Coursera.
root@trends:~# wget https://www.coursera.org -S
--2020-02-23 11:47:01-- https://www.coursera.org/
Resolving www.coursera.org (www.coursera.org)... 13.224.241.48, 13.224.241.124, 13.224.241.82, ...
Connecting to www.coursera.org (www.coursera.org)|13.224.241.48|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 511551
Connection: keep-alive
Cache-Control: private, no-cache, no-store, must-revalidate, max-age=0
Date: Sun, 23 Feb 2020 11:47:01 GMT
etag: W/"7156d-WcZHnHFl4b4aDOL4ZSrXP0iBX3o"
Server: envoy
Set-Cookie: CSRF3-Token=1583322421.s1b4QL6OXSUGHnRI; Max-Age=864000; Expires=Wed, 04 Mar 2020 11:47:02 GMT; Path=/; Domain=.coursera.org
Set-Cookie: __204u=9205355775-1582458421174; Max-Age=31536000; Expires=Mon, 22 Feb 2021 11:47:02 GMT; Path=/; Domain=.coursera.org
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
x-coursera-render-mode: html
x-coursera-render-version: v2
X-Coursera-Request-Id: NCnPPlYyEeqfcxIHPk5Gqw
X-Coursera-Trace-Id-Hex: a5ef7028d77ae8f8
x-envoy-upstream-service-time: 1090
X-Frame-Options: SAMEORIGIN
x-powered-by: Express
X-XSS-Protection: 1; mode=block
X-Cache: Miss from cloudfront
Via: 1.1 884d101a3faeefd4fb32a5d2a8a076b7.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: LHR62-C3
X-Amz-Cf-Id: vqvX6ZUQgtZAde62t7qjafIAqHXQ8BLAv8UhkPHwyTMpvH617yeIbQ==
Length: 511551 (500K) [text/html]
Manipulate the User-Agent
There might be a situation where you want to connect a site using a custom user-agent. Or specific browser’s user-agent. This is doable by specifying --user-agent
. The below example is for the user agent as MyCustomUserAgent.
root@trends:~# wget https://gf.dev --user-agent="MyCustomUserAgent"
Host Header
When an application is still in development, you may not have a proper URL to test it. Or, you may want to test an individual HTTP instance using IP, but you need to supply the host header for application to work properly. In this situation, --header
would be useful.
Let’s take an example of testing http://10.10.10.1 with host header as application.com
wget --header="Host: application.com" http://10.10.10.1
Not just host, but you can inject any header you like.
Connect using Proxy
If you are working on a DMZ environment, you may not have access to Internet sites. But you can take advantage of proxy to connect.
wget -e use_proxy=yes http_proxy=$PROXYHOST:PORT http://externalsite.com
Don’t forget to update $PROXYHOST:PORT variable with the actual ones.
Connect using a specific TLS protocol
Usually, I would recommend using OpenSSL to test the TLS protocol. But, you can use wget too.
wget --secure-protocol=TLSv1_2 https://example.com
The above will force wget to connect over TLS 1.2.
Conclusion
Knowing the necessary command can help you at work. I hope the above gives you an idea of what you can do with wget
.