Geekflare is supported by our audience. We may earn affiliate commissions from buying links on this site.
Share on:

Pandas Column Renaming Made Easy: Methods and Best Practices

Pandas Column Renaming Made Easy
Invicti Web Application Security Scanner – the only solution that delivers automatic verification of vulnerabilities with Proof-Based Scanning™.

This article is a guide on how to rename columns in Pandas.

Pandas is a Python library for dealing with datasets. It easily reads data from different file formats such as CSV, JSON, and SQL. When data is loaded into Pandas, it is stored in a DataFrame object.

A DataFrame is a two-dimensional object, meaning data is stored in a table-like format with rows and columns. This is similar to storing data in CSV or spreadsheet files. When you load data, pandas will try to load column names from the dataset source file.

columns

However, the loaded column names may not be ideal, and you may want to rename the columns to something more meaningful.

In this article, we will first discuss the best practices for naming columns in Pandas. Afterward, we will get to the main subject, which is the methods to rename them.

Best Practices for Naming Columns in Pandas

Before we get to the renaming guide portion of this article, here are some best practices and conventions you may want to follow when naming your columns in pandas.

✅ Use descriptive names. Cryptic names like col_1 are hard to understand and do not convey much information about the data contained in the dataset.

✅ Use snake case when naming columns. In snake case, your columns names will look like this: number_of_people Instead of like this NumberOfPeople.

✅ While the snake case is preferred, you should use the naming convention that your original dataset uses. This avoids confusion when moving between your dataset and the Pandas’ DataFrame object.

✅ Whichever naming convention you use, remain consistent throughout the dataset. Avoid naming some columns using PascalCase and others using snake_case.

✅ Lastly, try to use shorter names. These are easier to type as the code suggestion and completion in notebooks are usually subpar. This means coding in a notebook requires lots of manual typing of code, and shorter names make life easier.

How to Rename Columns in Pandas

You can consume the content of this article in two ways. First, you could just read through this as a reference. Second, you could follow along, coding as well, so you have a better chance of remembering the concepts discussed. I recommend the latter method.

To code along, I will be using a notebook hosted with Google Colab. You can create one as well and follow along; it is completely free. The notebook with all the code I will write in this tutorial is available here.

Setting up the Notebook

Before we start renaming columns in pandas, let’s set up the notebook and load some sample data. Create a code cell and import pandas using the code below.

import pandas as pd

After importing pandas, you can load the california_housing_data dataset, which is available by default as a sample dataset when you create a Google Colab notebook.

housing_data = pd.read_csv('/content/sample_data/california_housing_train.csv')

You can see the first few rows of the dataset using the code:

housing_data.head()

You can also list the columns present in the dataset with the following:

housing_data.columns

This should produce the following output:

Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
       'total_bedrooms', 'population', 'households', 'median_income',
       'median_house_value'],
      dtype='object')

This means your data has been loaded correctly, and the DataFrame has what we need.

Method 1: Using the Rename Function

The easiest way to rename columns in pandas is to use the rename method of the DataFrame object. Using this method, you rename a column by calling the rename method. The method takes in multiple arguments.

In this case, we are interested in renaming a column, so we will pass in the columns keyword argument. The value of this argument is a dictionary whose entries represent the mapping from the old column names to the new ones. Here is an example where we rename the households column to houses.

housing_data.rename(columns={ 'households': 'houses' })

This should produce the following output:

Housing data after renaming pandas dataframe column

As you can see, we pass in a dictionary where the key is the old column name, and the value is the new column name. The value does not have to be just a string. It can also be a function where the value returned by the function is the new column name. If you want to rename more columns, you can add more entries to the dictionary.

Method 2: Replacing the Column String

Another method you could use for renaming columns in pandas is replacing the column string on a DataFrame. Suppose you wanted to rename the column currently named population to number_of_people. Using this method, you would write the following code:

housing_data.columns = housing_data.columns.str.replace('population', 'number_of_people')

To print out the modified list, we use the following code:

housing_data

This should produce the following output to the screen:

Screenshot-from-2023-04-04-07-35-06

Method 3: Assigning a List of Column Names

Alternatively, you can rename columns in Pandas by assigning a list to the columns property of the DataFrame. For this example, if I wanted to rename all the columns so that they all use numbers, I could use the following code:

housing_data.columns = [x for x in range(9)]

In this example, I have set the housing_data.columns attribute to a list of integers from 0 to 8. To generate the list, I used list comprehension, which is a native Python feature to conveniently generate lists of values using a for loop.

The disadvantage of using this method is you have to rename the entire set of column names; You cannot just rename a subset of columns. Ideally, your column names should be something more descriptive, but I am just using numbers here as a demonstration.

You can view the output by writing the following:

housing_data
Screenshot-from-2023-04-04-05-58-41

Method 4: Using the set_axis() Function to Rename Columns in Pandas

The method we will discuss is the set_axis method of the DataFrame object. This method is used to set a list of values as the axis values for any of the two axes in Pandas. Since we are renaming columns, we are setting axis 1. To use this method, we use the following code:

column_names = [str(x) for x in range(8, -1, -1)]
housing_data.set_axis(column_names, axis=1, inplace=True)

The first line generates a list of values from 8 to 0 in descending order and stores them in the column_names variable. In the second line, we call the set_axis method, providing the column_names as an argument and setting axis to be modified as axis 1. We also set inplace to True so it modifies the original DataFrame.

We can view the DataFrame by writing:

housing_data

This should produce the following:

Screenshot-from-2023-04-04-06-47-20

Final Words

This article briefly introduced how data is stored in tabular format in pandas. We also discussed the best practices for naming columns in Pandas to make our lives easier.

Lastly and most importantly, we also discussed the different methods of renaming columns in pandas.

Next, check out how to create a Pandas DataFrame [with examples].

Thanks to our Sponsors
More great readings on Development
Power Your Business
Some of the tools and services to help your business grow.
  • Invicti uses the Proof-Based Scanning™ to automatically verify the identified vulnerabilities and generate actionable results within just hours.
    Try Invicti
  • Web scraping, residential proxy, proxy manager, web unlocker, search engine crawler, and all you need to collect web data.
    Try Brightdata
  • Semrush is an all-in-one digital marketing solution with more than 50 tools in SEO, social media, and content marketing.
    Try Semrush
  • Intruder is an online vulnerability scanner that finds cyber security weaknesses in your infrastructure, to avoid costly data breaches.
    Try Intruder