Geekflare is supported by our audience. We may earn affiliate commissions from buying links on this site.
In Database Last updated: August 22, 2023
Share on:
Invicti Web Application Security Scanner – the only solution that delivers automatic verification of vulnerabilities with Proof-Based Scanning™.

Want to speed up database queries? Learn how to create a database index using SQL and optimize query performance—and speed up data retrieval.

When you’re retrieving data from a database table, you’ll have to filter based on specific columns more often.

Suppose you write an SQL query to retrieve the data based on specific conditions. By default, running the query runs a full-table scan until all the records that satisfy the condition have been found and then returns the results.

This can be extremely inefficient when you have to query a large database table with several million rows. You can speed up such queries by creating a database index.

What is a Database Index?

What-Is-a-Database-Index-And-How-Does-It-Help

When you want to find a specific term in a book, will you do a full-book scan—one page after the other—looking for the particular term? Well, you don’t.

You’ll instead look up the index to find out which pages reference the term and jump straight to those pages. An index in a database works a lot like the indexes in a book.

A database index is a set of pointers or references to the actual data but sorted in a way that makes data retrieval faster. Internally, a database index can be implemented using data structures such as B+ trees and hash tables. Therefore, a database index improves the speed and efficiency of data retrieval operations.

Creating a Database Index in SQL

Now that we know what a database index is and how it can speed up data retrieval, let’s learn how to create a database index in SQL.

When you perform filtering operations—by specifying the retrieval condition using a WHERE clause—you might want to query a particular column more frequently than others.

CREATE INDEX index_name ON table (column)

Here,

  • index_name is the name of the index to be created
  • table refers to the table in the relational database 
  • column refers to the name of the column in the database table on which we need to create the index.

You can also create indexes on multiple columns—a multi-column index—depending on the requirements. Here’s the syntax to do so:

CREATE INDEX index_name ON table (column_1, column_2,...,column_k)

Now let’s move on to a practical example.

Understanding the Performance Gains of Database Index

To understand the advantage of creating an index, we need to create a database table with a large number of records. The code examples are for SQLite. But you can also use other RDBMS of your choice, such as PostgreSQL and MySQL.

Populating a Database Table With Records

You can also use Python’s built-in random module to create and insert records into the database. However, we’ll use Faker to populate the database table with a million rows.

The following Python script:

  • Creates and connects to the customer_db database.
  • Create a customers table with the fields: first_name, last_name, city, and num_orders.
  • Generates synthetic data and inserts data—one million records—into the customers table.

You can also find the code on GitHub.

# main.py
# imports
import sqlite3
from faker import Faker
import random

# connect to the db
db_conn = sqlite3.connect('customer_db.db')
db_cursor = db_conn.cursor()

# create table
db_cursor.execute('''CREATE TABLE customers (
                  id INTEGER PRIMARY KEY,
                  first_name TEXT,
                  last_name TEXT,
                  city TEXT,
                  num_orders INTEGER)''')

# create a Faker object
fake = Faker()
Faker.seed(27)

# create and insert 1 million records
num_records = 1_000_000

for _ in range(num_records):
    first_name = fake.first_name()
    last_name = fake.last_name()
    city = fake.city()
    num_orders = random.randint(0,100)
    db_cursor.execute('INSERT INTO customers (first_name, last_name, city, num_orders) VALUES (?,?,?,?)', (first_name, last_name, city, num_orders))

# commit the transaction and close the cursor and connection
db_conn.commit()
db_cursor.close()
db_conn.close()

Now we can start querying.

Creating an Index on the City Column

Suppose you want to get the customer information by filtering based on the city column. Your SELECT query will look like so:

SELECT column(s) FROM customers
WHERE condition;

So let’s create the city_idx on the city column in the customers table:

CREATE INDEX city_idx ON customers (city);

⚠ Creating an index takes a non-negligible amount of time and is a one-time operation. But the performance benefits when you need to large number of queries—by filtering on the city column—will be significant.

Deleting a Database Index

To delete an index, you can use the DROP INDEX statement like so:

DROP INDEX index_name;

Comparing Query Times With and Without Index

If you want to run queries within a Python script, you can use the default timer to get the execution times for queries.

Alternatively, you can run the queries using the sqlite3 command-line client. To work with customer_db.db using the command-line client, run the following command at the terminal:

$ sqlite3 customer_db.db;

To get the approximate execution times, you can use the .timer functionality built into sqlite3 like so:

sqlite3 > .timer on
        > <query here>

Because we’ve created an index on the city column, the queries which involve filtering based on the city column in the WHERE clause will be much faster.

First, run the queries. Then, create the index and re-run the queries. Note down the execution times in both the cases. Here are some examples:

QueryTime without IndexTime with Index
SELECT * FROM customers
WHERE city LIKE ‘New%’
LIMIT 10;
0.100 s0.001 s
SELECT * FROM customers
WHERE city=’New Wesley’;
0.148 s0.001 s
SELECT * FROM customers
WHERE city IN (‘New Wesley’, ‘New Steven’, ‘New Carmenmouth’);
0.247 s0.003 s

We see that the retrieval times with index is several orders faster than those without index on the city column.

Best Practices to Create and Use Database Indexes

Best-Practices-to-Create-and-Use-Database-Indexes

You should always check if the performance gains are greater than the overhead of creating a database index. Here are some best practices to keep in mind:

  • Choose the right columns to create an index. Avoid creating too many indexes because of the substantial overhead. 
  • Every time an indexed column is updated, the corresponding index should be updated, too. So creating a database index (though speeds up retrieval) significantly slows down insertions and update operations. Therefore, you should create indexes on columns that are frequently queried but rarely updated.

When Should You Not Create an Index?

By now you should have an idea of when and how to create an index. But let’s also state when database index might not be necessary:

  • When the database table is small and does not contain a large number of rows, full-table scan to retrieve data is not as expensive.
  • Do not create indexes on columns that are rarely used for retrieval. When you create indexes on columns that are not frequently queried, the cost of creating and maintaining an index outweighs the performance gains.

Summing Up

Let’s review what we’ve learned:

  • When querying a database to retrieve data, you may need to filter based on certain columns more frequently. A database index on such frequently queried columns can improve performance.
  • To create an index on a single column, use the syntax: CREATE INDEX index_name ON table (column). If you want to create s multi-column index, use: CREATE INDEX index_name ON table (column_1, column_2,...,column_k)
  • Whenever an indexed column is modified, the corresponding index should be updated, too. Therefore, choose the right columns—frequently queried and much less frequently updated—to create an index.
  • If the database table is relatively smaller, the cost of creating, maintaining, and updating an index will be greater than the performance gains. 

In most modern database management systems, there’s a query optimizer that checks if an index on a specific column will make the query run faster. Next, let’s learn the best practices for database design.

  • Bala Priya C
    Author
    Bala Priya is a developer and technical writer from India with over three years of experience in the technical content writing space. She shares her learning with the developer community by authoring tech tutorials, how-to guides, and more…. read more
  • Narendra Mohan Mittal
    Editor

    Narendra Mohan Mittal is a Senior Digital Branding Strategist and Content Editor with over 12 years of versatile experience. He holds an M-Tech (Gold Medalist) and B-Tech (Gold Medalist) in Computer Science & Engineering.


    read more
Thanks to our Sponsors
More great readings on Database
Power Your Business
Some of the tools and services to help your business grow.
  • Invicti uses the Proof-Based Scanning™ to automatically verify the identified vulnerabilities and generate actionable results within just hours.
    Try Invicti
  • Web scraping, residential proxy, proxy manager, web unlocker, search engine crawler, and all you need to collect web data.
    Try Brightdata
  • Monday.com is an all-in-one work OS to help you manage projects, tasks, work, sales, CRM, operations, workflows, and more.
    Try Monday
  • Intruder is an online vulnerability scanner that finds cyber security weaknesses in your infrastructure, to avoid costly data breaches.
    Try Intruder