Geekflare is supported by our audience. We may earn affiliate commissions from buying links on this site.
In Database Last updated: May 12, 2023
Share on:
Invicti Web Application Security Scanner – the only solution that delivers automatic verification of vulnerabilities with Proof-Based Scanning™.

Every AWS service logs its processing into files organized under CloudWatch log groups. The log groups are usually named after the service itself for easier identification. The service’s system messages or common state information are written into those log files by default. 

However, you can add custom log message information on top of the default ones. If such logs are created wisely, they can serve to create useful CloudWatch dashboards.

With metrics and structured information that gives extra details about job processing. Not only can they contain standard widgets with system-like information about the service. You can extend this with your own content, aggregated into your custom widget or metric.

Query the Log Files

AWS-CloudWatch-Logs
Source: aws.amazon.com

AWS CloudWatch Log Insights allows you to search and analyze log data from your AWS resources in real time. You can look at it as a database view. You define the query on the dashboard, and the dashboard will select it when you visit it or at the specified time window in the past, as you define it within the dashboard view. 

It uses a query language called CloudWatch Logs Insights to search and analyze log data. The query language is based on a subset of the SQL language. It allows you to search and filter log data. You can search for specific log events, custom log text or keywords, and filter log data based on specific fields. And most importantly, aggregate log data within one or more log files to generate summarized metrics and visualizations.

When you run a query, CloudWatch Log Insights searches through the log data in the log group. Then it returns the texts resulting from the files that match your query criteria. 

Example of Log File Query

Let’s have a look at some basic queries to understand the concept.

Every service, by default, logs some crucial service errors. Even if you don’t create a dedicated custom log for such error events. Then with a simple query, you can count the number of errors in your application logs over the last hour:

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(1h)

Or here is how to monitor the average response time of your API over the last day:

fields @timestamp, @message
| filter @message like /API response time/
| stats avg(response_time) by bin(1d)

Since, by default, the CPU utilization is information logged by the service into CloudWatch, you can gather this type of metric as well:

fields @timestamp, @message
| filter @message like /CPUUtilization/
| stats avg(value) by bin(1h)

These queries can be customized to fit your specific use case and can be used to create custom metrics and visualizations in CloudWatch Dashboards. The way how to do it is to place the widget on the dashboard and place the code inside the widget to define what to select.

Here are some of the widgets that can be used in CloudWatch Dashboards and filled by content from Log Insights:

  • Text widgets – Display text-based information, such as the output of a CloudWatch Insights query.
  • Log query widgets – Display the results of a CloudWatch Insights log query, such as the number of errors in your application logs.

How to Create Useful Log Information for Dashboard

AWS-CloudWatch-Dashboard
Source: aws.amazon.com

To effectively use CloudWatch Insights queries in CloudWatch Dashboards, it’s good to follow some best practices when creating CloudWatch logs for each of the services you use in your system. Here are some tips:

#1. Use Structured Logging 

You shall stick to a logging format that uses a predefined schema to log data in a structured format. This makes it easier to search and filter log data using CloudWatch Insights queries.

This basically means standardizing your logs across different services in your architecture platform. Having it defined in development standards helps tremendously. 

For example, you can define that each problem related to a specific database table will be logged with starting message like: “[TABLE_NAME] Warning / Error: <message>”.

Or you can separate full data jobs from delta data jobs by prefixes like “[FULL/DELTA]” to select only messages related to the concrete data processes.

You can define that while processing data from a specific source system, the name of the system will be a prefix of each related log entry. It’s much easier afterward to filter such messages from the log files and build metrics over them.

AWS-CloudWatch-Structured-Logging
Source: aws.amazon.com

#2. Use Consistent Log Formats

Use consistent log formats across all your AWS resources to make it easier to search and filter log data using CloudWatch Insights queries. 

This is quite related to the previous point, but the fact is the more standardized the log format is, the easier it is to use the log data. Developers can then rely on that format and use it even intuitively. 

The cruel fact is most of the projects don’t bother with any standards around logging. What’s more, many projects don’t even create any custom logs at all. It’s shocking but also so common at the same time. 

I can’t even tell how many times I found myself wondering how people can live here without any error-handling approach. And if anybody made an effort to do some sort of error handling as an exception, they did it wrongly.

So, a consistent log format is a strong asset. Not many have them.

#3. Include Relevant Metadata

Include metadata in your log data, such as timestamps, resource IDs, and error codes, to make it easier to search and filter log data using CloudWatch Insights queries.

#4. Enable Log Rotation

Enable log rotation to prevent your log data from becoming too large and to make it easier to search and filter log data using CloudWatch Insights queries.

Having no log data is one thing, but having too many of them without structure is similarly desperate. If you can’t use your data, it’s like having no data at all.

#5. Use CloudWatch Logs Agents

If you can’t help yourself and just refuse to build your customized log system, then at least use CloudWatch Logs agents. They automatically send log data from your AWS resources to CloudWatch Logs. This makes it easier to search and filter log data using CloudWatch Insights queries.

More Complex Insights Query Examples

CloudWatch Insights query can be more complicated than just two lines statement. 

fields @timestamp, @message
| filter @message like /ERROR/
| filter @message not like /404/
| parse @message /.*\[(?<timestamp>[^\]]+)\].*\"(?<method>[^\s]+)\s+(?<path>[^\s]+).*\" (?<status>\d+) (?<response_time>\d+)/
| stats avg(response_time) as avg_response_time, count() as count by bin(1h), method, path, status
| sort count desc
| limit 20

This query does the following: 

  1. Selects log events that contain the string “ERROR” but not “404”. 
  2. Parses the log message to extract the timestamp, HTTP method, path, status code, and response time. 
  3. Calculates the average response time and count of log events for each combination of HTTP method, path, status code, and hour. 
  4. Sorts the results by counting in descending order. 
  5. Limits the output to the top 20 results.

This query identifies the most common errors in your application and tracks the average response time for each combination of HTTP method, path, and status code. You can use the results to create custom metrics and visualizations in CloudWatch Dashboards to monitor the performance of your web application and troubleshoot issues.

Another example of querying the Amazon S3 service messages:

fields @timestamp, @message
| filter @message like /REST\.API\.REQUEST/
| parse @message /.*\"(?<method>[^\s]+)\s+(?<path>[^\s]+).*\" (?<status>\d+) (?<response_time>\d+)/
| stats avg(response_time) as avg_response_time, count() as count by bin(1h), method, path, status
| sort count desc
| limit 20
  • The query selects log events that contain the string “REST.API.REQUEST”. 
  • Then parses the log message to extract the HTTP method, path, status code, and response time. 
  • It calculates the average response time and count of log events for each combination of HTTP method, path, and status code and sorts the results by count in descending order. 
  • Limits the output to the top 20 results.

You can use the output of this query to create a line graph in a CloudWatch Dashboard that shows the average response time for each combination of HTTP method, path, and status code over time.

Building the Dashboard

To fill in the metrics and visualizations in CloudWatch Dashboards from the output of CloudWatch Insights log queries, you can navigate to the CloudWatch console and follow the Dashboard wizard to build up your content. 

After that, this is how the code of a CloudWatch Dashboard looks and contains metrics filled by CloudWatch Insights Query data:

{
    "widgets": [
        {
            "type": "metric",
            "x": 0,
            "y": 0,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    [
                        "AWS/EC2",
                        "CPUUtilization",
                        "InstanceId",
                        "i-0123456789abcdef0",
                        {
                            "label": "CPU Utilization",
                            "stat": "Average",
                            "period": 300
                        }
                    ]
                ],
                "view": "timeSeries",
                "stacked": false,
                "region": "us-east-1",
                "title": "EC2 CPU Utilization"
            }
        },
        {
            "type": "log",
            "x": 0,
            "y": 6,
            "width": 12,
            "height": 6,
            "properties": {
                "query": "fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(1h)
",
                "region": "us-east-1",
                "title": "Application Errors"
            }
        }
    ]
}

This CloudWatch Dashboard contains two widgets:

  1. A metric widget that displays the average CPU utilization of an EC2 instance over time. CloudWatch Insights Query populates the widget. It selects the CPU utilization data for a specific EC2 instance and aggregates it at 5-minute intervals.
  2. A log widget that displays the number of application errors over time. It selects log events that contain the string “ERROR” and aggregates them by the hour.

It’s a JSON format file with a definition of the dashboard and metrics inside. It contains (as a property) also the insight query itself. 

You can take the code and deploy it to whatever AWS account you need. Assuming the services and log messages are consistent over all your AWS accounts and stages, the dashboard will work on all the accounts without any need to change the source code of the dashboard.

Final Words

Building up a solid logging structure was always a good investment in the future of the system’s reliability. Now it can serve an even greater purpose. You can have useful dashboards with metrics and visualizations just as a side effect of that.

Having a necessity to be done only once, with only a little additional work, the development team, testing team, and production users can all benefit from the same solution.

Next, check out the best AWS monitoring tools.

  • Michal Vojtech
    Author
    Delivery-oriented architect with implementation experience in data/data warehouse solutions with telco, billing, automotive, bank, health, and utility industries. Certified for AWS Database Specialty and AWS Solution Architect… read more
Thanks to our Sponsors
More great readings on Database
Power Your Business
Some of the tools and services to help your business grow.
  • Invicti uses the Proof-Based Scanning™ to automatically verify the identified vulnerabilities and generate actionable results within just hours.
    Try Invicti
  • Web scraping, residential proxy, proxy manager, web unlocker, search engine crawler, and all you need to collect web data.
    Try Brightdata
  • Monday.com is an all-in-one work OS to help you manage projects, tasks, work, sales, CRM, operations, workflows, and more.
    Try Monday
  • Intruder is an online vulnerability scanner that finds cyber security weaknesses in your infrastructure, to avoid costly data breaches.
    Try Intruder