Want to speed up database queries? Learn how to create a database index using SQL and optimize query performance—and speed up data retrieval.
When you’re retrieving data from a database table, you’ll have to filter based on specific columns more often.
Suppose you write an SQL query to retrieve the data based on specific conditions. By default, running the query runs a full-table scan until all the records that satisfy the condition have been found and then returns the results.
This can be extremely inefficient when you have to query a large database table with several million rows. You can speed up such queries by creating a database index.
What is a Database Index?
When you want to find a specific term in a book, will you do a full-book scan—one page after the other—looking for the particular term? Well, you don’t.
You’ll instead look up the index to find out which pages reference the term and jump straight to those pages. An index in a database works a lot like the indexes in a book.
A database index is a set of pointers or references to the actual data but sorted in a way that makes data retrieval faster. Internally, a database index can be implemented using data structures such as B+ trees and hash tables. Therefore, a database index improves the speed and efficiency of data retrieval operations.
Creating a Database Index in SQL
Now that we know what a database index is and how it can speed up data retrieval, let’s learn how to create a database index in SQL.
When you perform filtering operations—by specifying the retrieval condition using a WHERE clause—you might want to query a particular column more frequently than others.
CREATE INDEX index_name ON table (column)
index_name is the name of the index to be created
table refers to the table in the relational database
column refers to the name of the column in the database table on which we need to create the index.
You can also create indexes on multiple columns—a multi-column index—depending on the requirements. Here’s the syntax to do so:
CREATE INDEX index_name ON table (column_1, column_2,...,column_k)
Now let’s move on to a practical example.
Understanding the Performance Gains of Database Index
To understand the advantage of creating an index, we need to create a database table with a large number of records. The code examples are for SQLite. But you can also use other RDBMS of your choice, such as PostgreSQL and MySQL.
Populating a Database Table With Records
You can also use Python’s built-in random module to create and insert records into the database. However, we’ll use Faker to populate the database table with a million rows.
The following Python script:
Creates and connects to the customer_db database.
Create a customers table with the fields: first_name, last_name, city, and num_orders.
Generates synthetic data and inserts data—one million records—into the customers table.
from faker import Faker
# connect to the db
db_conn = sqlite3.connect('customer_db.db')
db_cursor = db_conn.cursor()
# create table
db_cursor.execute('''CREATE TABLE customers (
id INTEGER PRIMARY KEY,
# create a Faker object
fake = Faker()
# create and insert 1 million records
num_records = 1_000_000
for _ in range(num_records):
first_name = fake.first_name()
last_name = fake.last_name()
city = fake.city()
num_orders = random.randint(0,100)
db_cursor.execute('INSERT INTO customers (first_name, last_name, city, num_orders) VALUES (?,?,?,?)', (first_name, last_name, city, num_orders))
# commit the transaction and close the cursor and connection
Now we can start querying.
Creating an Index on the City Column
Suppose you want to get the customer information by filtering based on the city column. Your SELECT query will look like so:
SELECT column(s) FROM customers
So let’s create the city_idx on the city column in the customers table:
CREATE INDEX city_idx ON customers (city);
⚠ Creating an index takes a non-negligible amount of time and is a one-time operation. But the performance benefits when you need to large number of queries—by filtering on the city column—will be significant.
Deleting a Database Index
To delete an index, you can use the DROP INDEX statement like so:
DROP INDEX index_name;
Comparing Query Times With and Without Index
If you want to run queries within a Python script, you can use the default timer to get the execution times for queries.
Alternatively, you can run the queries using the sqlite3 command-line client. To work with customer_db.db using the command-line client, run the following command at the terminal:
$ sqlite3 customer_db.db;
To get the approximate execution times, you can use the .timer functionality built into sqlite3 like so:
sqlite3 > .timer on
> <query here>
Because we’ve created an index on the city column, the queries which involve filtering based on the city column in the WHERE clause will be much faster.
First, run the queries. Then, create the index and re-run the queries. Note down the execution times in both the cases. Here are some examples:
Time without Index
Time with Index
SELECT * FROM customers WHERE city LIKE ‘New%’ LIMIT 10;
SELECT * FROM customers WHERE city=’New Wesley’;
SELECT * FROM customers WHERE city IN (‘New Wesley’, ‘New Steven’, ‘New Carmenmouth’);
We see that the retrieval times with index is several orders faster than those without index on the city column.
Best Practices to Create and Use Database Indexes
You should always check if the performance gains are greater than the overhead of creating a database index. Here are some best practices to keep in mind:
Choose the right columns to create an index. Avoid creating too many indexes because of the substantial overhead.
Every time an indexed column is updated, the corresponding index should be updated, too. So creating a database index (though speeds up retrieval) significantly slows down insertions and update operations. Therefore, you should create indexes on columns that are frequently queried but rarely updated.
When Should You Not Create an Index?
By now you should have an idea of when and how to create an index. But let’s also state when database index might not be necessary:
When the database table is small and does not contain a large number of rows, full-table scan to retrieve data is not as expensive.
Do not create indexes on columns that are rarely used for retrieval. When you create indexes on columns that are not frequently queried, the cost of creating and maintaining an index outweighs the performance gains.
Let’s review what we’ve learned:
When querying a database to retrieve data, you may need to filter based on certain columns more frequently. A database index on such frequently queried columns can improve performance.
To create an index on a single column, use the syntax: CREATE INDEX index_name ON table (column). If you want to create s multi-column index, use: CREATE INDEX index_name ON table (column_1, column_2,...,column_k)
Whenever an indexed column is modified, the corresponding index should be updated, too. Therefore, choose the right columns—frequently queried and much less frequently updated—to create an index.
If the database table is relatively smaller, the cost of creating, maintaining, and updating an index will be greater than the performance gains.
In most modern database management systems, there’s a query optimizer that checks if an index on a specific column will make the query run faster. Next, let’s learn the best practices for database design.
Bala Priya C
Bala Priya is a developer and technical writer from India with over three years of experience in the technical content writing space. She shares her learning with the developer community by authoring tech tutorials, how-to guides, and more…. read more
Narendra Mohan Mittal
Narendra Mohan Mittal is a Senior Digital Branding Strategist and Content Editor with over 12 years of versatile experience. He holds an M-Tech (Gold Medalist) and B-Tech (Gold Medalist) in Computer Science & Engineering.
Cloud data warehouses provide a scalable solution for storing and analyzing data in real-time. They enable businesses to reduce costs and improve efficiency by eliminating the need for on-premises hardware and maintenance. Here is our list of the best cloud data warehouse platforms.