Let’s talk about distributed monitoring time-series database.
A time-series database is optimized for timestamp or time-series data. Time series data mean measurements or events that are tracked, monitored, collected, or aggregated over a period of time. These could be data collected from heartbeats of motion tracking sensors, JVM metrics from the java applications, market trade data, network data, API responses, process uptime, etc.
Time-series databases are completely customized with timestamped data, which is indexed and efficiently written in such a way that you can insert time-series data. You can query those time series data much faster than how you will be doing it in a relational or NoSQL database.
Lately, it has gained a lot of popularity. And why not? It does a fantastic job for business and IT operations monitoring. The good news is – there are plenty of options to choose from, and most of them are open-source.
InfluxDB is one of the most popular time series databases among DevOps, which is written in Go. InfluxDB was designed from the ground up to provide a highly scalable data ingestion and storage engine. It is very efficient at collecting, storing, querying, visualizing, and taking action on streams of time series data, events, and metrics in real-time.
It provides downsampling and data retention policies to support keeping high value, high precision data in memory, and lower value data to disk. It is built on a cloud-native fashion for providing scalability across multiple deployment topologies, including cloud on-premises and hybrid environments.
InfluxDB is an open-source solution and enterprise-ready. It uses InfluxQL, which is very similar to a structure query language, for interacting with data. The latest version offers agents, dashboards, queries, and tasks in a toolkit. It is an all-in-one tool for dashboarding, visualizing, and alerting.
- High performance for time series data with high throughout ingest and real-time querying
- InfluxQL to interact with data which is a SQL like a query language
- Core component of the TICK (Telegraf, InfluxDB, Chronograf, and Kapacitor) stack.
- Plugin support for protocols such as collectd, Graphite, OpenTSDB for data ingestion
- Can handle millions of data points in just 1 second
- Retention policies for automatically removing the stale data
Since it’s open-source, you can download and get it started on your server. However, they do offer InfluxDB Cloud on AWS, Azure, and GCP.
Prometheus is an open-source monitoring solution used to understand insights from metrics data and send necessary alerts. It has a local on-disk time-series database that stores data in a custom format on disk.
Prometheus’s data model is multi-dimensional based on time series; it stores all the data as streams of timestamped values. It is very much useful when working with a fully numeric time series. Collecting microservices data and querying it is one of the strengths of Prometheus.
It tightly integrates with Grafana for visualization and if you are a newbie, then read this Prometheus and Grafana introduction article.
- Has a multi-dimensional model which used metrics name and key-value pairs (labels)
- PromQL for querying time series data to generate tables, alerts, and Adhoc graphs
- Uses HTTP pull mode for collecting time-series data
- Uses intermediary gateway to push time series
Prometheus has hundreds of exporters to export the data from Windows, Linux, Java, Database, APIs, Website, Server Hardware, PHP, Messaging, and more. To monitor Linux, check this Prometheus + Grafana setup.
TimescaleDB is an open-source relational database that makes SQL scalable for time-series data. This database is built on PostgreSQL.
It offers two products – the first option is a community edition, free to use that you can install on your server. The second option is TimescaleDB Cloud, where you get fully hosted and managed infrastructure on the cloud for your deployment needs.
It can be used for DevOps monitoring, understanding application metrics, tracking data from IoT devices, understanding financial data, etc. You can measure logs, Kubernetes events, Prometheus metrics, even custom metrics.
For product owners, you can use it to understand a product’s performance over time, which helps in making strategic decisions for growth.
- Run queries 10-100X faster than PostgreSQL, MongoDB
- Can scale to petabytes horizontally and writes millions of data points per second
- Very similar to PostgreSQL, so easy for developers and admins to operate
- Combines relational and time-series database functionalities to build powerful applications.
- In-built algorithms and performance feature to save a lot of costs.
Graphite is an all-in-one solution for storing and efficiently visualizing real-time time-series data. Graphite can do two things, store time-series data and render graphs on demand. But it doesn’t collect data for you; for that, you can use tools such as collectd, Ganglia, Sensu, telegraf, etc.
It has three components – Carbon, Whisper, and Graphite-Web. Carbon receives the time series data, aggregates it, and persists it to the disk. Whisper is time-series database storage that stores the data. Graphite-Web is the front-end for creating dashboards and visualizing the data.
- The metrics format in which the data is submitted is straightforward.
- Comprehensive API for rendering the data and creating charts, dashboards, graphs
- Provides a rich set of statistical library and transformative rendering functions
- Chains multiple render functions to construct a target query.
QuestDB is a relational column-oriented database that can perform real-time analytics on time series data. It works with SQL and some extensions to create a relational model for time series data. QuestDB has been coded from scratch and has no dependencies which enhance its performance.
QuestDB supports relational, and time-series joins, which helps in correlating the data. The easiest way to get started with QuestDB is to deploy it inside a Docker container.
- Interactive console to import data using drag and drop and query it
- Supported on cloud-native (AWS, Azure, GCP), on-premises, or embedded
- Provides enterprise integration with features such as active directory, high availability, enterprise security, clustering
- Provides insights in real-time using operational and predictive analytics
How can AWS not in the list?
AWS Timestream is a serverless time series database service that is fast and scalable. It is used majorly for IoT applications to store trillions of events in a day and 1000 times faster with 1/10th cost of relational databases.
Using its purpose-built query engine, you can query recent data and historical stored data simultaneously. It provides multiple built-in functions to analyze time-series data to find useful insights.
Amazon Timestream features:
- No servers to manage or instances to provision; everything is handled automatically.
- Cost-effective, pay only for what you ingest, store, and query.
- Capable of ingesting trillions of events daily with no drop in performance
- Built-in analytics capability with standard SQL, interpolation, and smoothing functions to identify trends, patterns, and anomalies
- All the data is encrypted using the AWS key management system (KMS) with customer manages keys (CMK)
OpenTSDB is a scalable time-series database that has been written on top of HBase. It is capable of storing trillions of data points at millions of writes per second. You can keep the data in OpenTSDB forever with its original timestamp and precise value, so you don’t lose any data.
It has a Time-series daemon (TSD) and command-line utilities. Time series daemon is responsible for storing data in HBase or retrieving it from it. You can talk to TSD using HTTP API, telnet, or simple built-in GUI. You need tools like flume, collectd, vacuumetrix, etc., to collect data from various sources into OpenTSDB.
- Can aggregate, filter, downsample metrics at breakneck speed
- Stores and writes data with millisecond precision
- Runs on Hadoop and HBase and scales easily by adding nodes to the cluster
- Uses GUI to generate graphs
Since more and more IoT/Smart devices are getting used these days, huge real-time traffic is getting generated on websites with millions of events in a day, trading on the market is increasing, and the time-series database has arrived! Time-series databases are a must-have in your production stack for monitoring.
Most of the above-listed time-series database is available to self-host, so go ahead, get a cloud VM and give it a try to see what works for you.