Monitor and analyze web server logs with open source real-time log analyzer – GoAccess

Web troubleshooting is fun and can be frustrating if you are not equipped with right tools.

If you are supporting heavy traffic website then often you need to analyze and monitor web servers logs for performance & capacity planning. This is essential for web engineer.

Checking smaller log size manually is ok, but if you have the large file, then it wouldn’t be fun to go through millions of lines to find the metrics.

That’s why you need tools to facilitate administrator job and make it more productive.

GoAccess is a lightweight open-source log analyzer which supports multiple log format and can be used with any of the following.

  • Nginx
  • Apache HTTP
  • AWS ELB, S3, CloudFront
  • Google cloud storage

What metrics can you analyze with GoAccess?

Nearly everything you capture in the logs. To give you an idea:

  • Time is taken to serve the request
  • Visitor IP, DNS, host
  • Visitor’s browser & Operating System details
  • 404 not found details
  • Top requests/visitor
  • Bandwidth
  • Static files
  • Geo Location
  • Status Code
  • and more..

Looking for these metrics to be monitored of your site?

Good!

On which OS you can install?

GoAccess got only one dependency – ncurses. If you can install, you can use it any OS.

It’s available in distribution package for:

  • Ubuntu
  • Debian
  • Fedora
  • CentOS
  • FreeBSD/OpenBSD
  • Slackware
  • Arch Linux
  • Gentoo
  • MacOS
  • Windows through Cygwin

However, you can also build from the source or use with Docker.

If you are new to Docker, I would recommend taking this Docker Mastery course.

Installing GoAccess on Ubuntu

  • Login to Ubuntu server with the root privilege
  • Use apt-get to install as below
apt-get install goaccess

Easy.

Installing on CentOS

Log in to the server and execute yumcommand

yum install goaccess

Installing using Source on CentOS/Ubuntu

Love compiling from source?

Here are the steps.

  • Install the following dependencies if using CentOS
yum install gcc ncurses-devel glib2-devel geoip-devel tokyocabinet-devel
  • If using Ubuntu
apt-get install libncursesw5-dev libgeoip-dev make
  • Download the latest package using wget
wget http://tar.goaccess.io/goaccess-1.2.tar.gz
  • Extract the downloaded file
gunzip –c goaccess-1.2.tar.gz | tar xvf –
  • Go to newly created folder, which you got after extract
cd goaccess-1.2
  • Compile with the below command
./configure --enable-geoip=legacy --enable-utf8
make
make install

Well done, you have installed GoAccess and all set to analyze the logs.

 

Verify Installation

Once installed, just execute goaccess on the command prompt and it should print the usage like below.

[[email protected] goaccess-1.2]# goaccess 
GoAccess - 1.2
Usage: goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...]
The following options can also be supplied to the command:
Log & Date Format Options
  --date-format=<dateformat>      - Specify log date format. e.g., %d/%b/%Y
  --log-format=<logformat>        - Specify log format. Inner quotes need to be
                                    escaped, or use single quotes.
  --time-format=<timeformat>      - Specify log time format. e.g., %H:%M:%S
User Interface Options
  -c --config-dialog              - Prompt log/date/time configuration window.
  -i --hl-header                  - Color highlight active panel.
  -m --with-mouse                 - Enable mouse support on main dashboard.
  --color=<fg:bg[attrs, PANEL]>   - Specify custom colors. See manpage for more
                                    details and options.
  --color-scheme=<1|2|3>          - Schemes: 1 => Grey, 2 => Green, 3 => Monokai.
  --html-custom-css=<path.css>    - Specify a custom CSS file in the HTML report.
  --html-custom-js=<path.js>      - Specify a custom JS file in the HTML report.
  --html-prefs=<json_obj>         - Set default HTML report preferences.
  --html-report-title=<title>     - Set HTML report page title and header.
  --json-pretty-print             - Format JSON output w/ tabs & newlines.
  --max-items                     - Maximum number of items to show per panel.
                                    See man page for limits.
  --no-color                      - Disable colored output.
  --no-column-names               - Don't write column names in term output.
  --no-csv-summary                - Disable summary metrics on the CSV output.
  --no-progress                   - Disable progress metrics.
  --no-tab-scroll                 - Disable scrolling through panels on TAB.
  --no-html-last-updated          - Hide HTML last updated field.
Server Options
  --addr=<addr>                   - Specify IP address to bind server to.
  --daemonize                     - Run as daemon (if --real-time-html enabled).
  --fifo-in=<path>                - Path to read named pipe (FIFO).
  --fifo-out=<path>               - Path to write named pipe (FIFO).
  --origin=<addr>                 - Ensure clients send the specified origin header
                                    upon the WebSocket handshake.
  --port=<port>                   - Specify the port to use.
  --real-time-html                - Enable real-time HTML output.
  --ssl-cert=<cert.crt>           - Path to TLS/SSL certificate.
  --ssl-key=<priv.key>            - Path to TLS/SSL private key.
  --ws-url=<url>                  - URL to which the WebSocket server responds.
File Options
  -                               - The log file to parse is read from stdin.
  -f --log-file=<filename>        - Path to input log file.
  -l --debug-file=<filename>      - Send all debug messages to the specified
                                    file.
  -p --config-file=<filename>     - Custom configuration file.
  --invalid-requests=<filename>   - Log invalid requests to the specified file.
  --no-global-config              - Don't load global configuration file.
Parse Options
  -a --agent-list                 - Enable a list of user-agents by host.
  -d --with-output-resolver       - Enable IP resolver on HTML|JSON output.
  -e --exclude-ip=<IP>            - Exclude one or multiple IPv4/6. Allows IP
                                    ranges e.g. 192.168.0.1-192.168.0.10
  -H --http-protocol=<yes|no>     - Set/unset HTTP request protocol if found.
  -M --http-method=<yes|no>       - Set/unser HTTP request method if found.
  -o --output=file.html|json|csv  - Output either an HTML, JSON or a CSV file.
  -q --no-query-string            - Ignore request's query string. Removing the
                                    query string can greatly decrease memory
                                    consumption.
  -r --no-term-resolver           - Disable IP resolver on terminal output.
  --444-as-404                    - Treat non-standard status code 444 as 404.
  --4xx-to-unique-count           - Add 4xx client errors to the unique visitors
                                    count.
  --all-static-files              - Include static files with a query string.
  --crawlers-only                 - Parse and display only crawlers.
  --date-spec=<date|hr>           - Date specificity. Possible values: `date`
                                    (default), or `hr`.
  --double-decode                 - Decode double-encoded values.
  --enable-panel=<PANEL>          - Enable parsing/displaying the given panel.
  --hour-spec=<hr|min>            - Hour specificity. Possible values: `hr`
                                    (default), or `min` (tenth of a min).
  --ignore-crawlers               - Ignore crawlers.
  --ignore-panel=<PANEL>          - Ignore parsing/displaying the given panel.
  --ignore-referer=<NEEDLE>       - Ignore a referer from being counted. Wild cards
                                    are allowed. i.e., *.bing.com
  --ignore-status=<CODE>          - Ignore parsing the given status code.
  --num-tests=<number>            - Number of lines to test. >= 0 (10 default)
  --process-and-exit              - Parse log and exit without outputting data.
  --real-os                       - Display real OS names. e.g, Windows XP, Snow
                                    Leopard.
  --sort-panel=PANEL,METRIC,ORDER - Sort panel on initial load. For example:
                                    --sort-panel=VISITORS,BY_HITS,ASC. See
                                    manpage for a list of panels/fields.
  --static-file=<extension>       - Add static file extension. e.g.: .mp3.
                                    Extensions are case sensitive.
GeoIP Options
  -g --std-geoip                  - Standard GeoIP database for less memory
                                   consumption.
  --geoip-database=<path>         - Specify path to GeoIP database file. i.e.,
                                    GeoLiteCity.dat, GeoIPv6.dat ...
Other Options
  -h --help                       - This help.
  -V --version                    - Display version information and exit.
  -s --storage                    - Display current storage method. e.g., B+
                                    Tree, Hash.
  --dcf                           - Display the path of the default config
                                    file when `-p` is not used.
Examples can be found by running `man goaccess`.
For more details visit: http://goaccess.io
GoAccess Copyright (C) 2009-2016 by Gerardo Orellana
[[email protected] goaccess-1.2]# 

Analyzing Nginx & Apache with GoAccess

One of the quickest ways to analyze access.log is by using-fparameter.

Ex:

goaccess -f access.log

Above, I am instructing to open the file access.log. This will show you the overall dashboard and the following 15 sections.

  • Unique visitors per day
  • Requested files
  • Static requests (fonts, image, pdf, etc)
  • Not found (404) requests
  • Visitor’s IP/host details
  • Visitor’s OS
  • Browser details
  • Time distribution
  • Referrer
  • HTTP status code
  • Geo location

If the chosen file is getting updated in a real-time then you will notice metrics get updated on the terminal. Here, you can go through the metrics you need to analyze.

Real-time Monitoring over HTTP(s)

GoAccess let you redirect the output to HTML file which you can use as a real-time monitoring. This is handy when you don’t want to login to the server each time you need to verify some metrics.

goaccess /var/log/nginx/access.log -o /var/www/geekflare.com/htdocs/real-time.html --log-format=COMBINED --real-time-html

Above, I am redirecting output to real-time.html file which is available under htdocs. Since it’s htdocs, I can access this file from https://geekflare.com/real-time.html whenever I need to see the metrics.

A beautiful dashboard!

However, I won’t recommend doing this way in production. I am sure you don’t want someone to read your web server logs and you may want to apply the following restriction.

  • Protect the file with user and password
  • Allow accessing only from your IP
  • Use other URL with custom port and put that behind a firewall so only allowed IP/users can access

GoAccess looks powerful open-source logs analyzer. It’s lightweight and FREE so go ahead and give a try.

You may also be interested to check out cloud-based log analyzer.