Mean, median, and mode are fundamental topics of statistics. You can easily calculate them in Python, with and without the use of external libraries.
These three are the main measures of central tendency. The central tendency lets us know the “normal” or “average” values of a dataset. If you’re just starting with data science, this is the right tutorial for you.
By the end of this tutorial you’ll:
Understand the concept of mean, median, and mode
Be able to create your own mean, median, and mode functions in Python
Make use of Python’s statistics module to quickstart the use of these measurements
If you want a downloadable version of the following exercises, feel free to check out the GitHub repository.
Let’s get into the different ways to calculate mean, median, and mode.
Calculating the Mean in Python
The mean or arithmetic average is the most used measure of central tendency.
Remember that central tendency is a typical value of a set of data.
A dataset is a collection of data, therefore a dataset in Python can be any of the following built-in data structures:
Lists, tuples, and sets: a collection of objects
Strings: a collection of characters
Dictionary: a collection of key-value pairs
Note: Altought there are other data structures in Python like queues or stacks, we’ll be using only the built-in ones.
We can calculate the mean by adding all the values of a dataset and dividing the result by the number of values. For example, if we have the following list of numbers:
[1, 2, 3, 4, 5, 6]
The mean or average would be 3.5 because the sum of the list is 21 and its length is 6. Twenty-one divided by six is 3.5. You can perform this calculation with the below calculation:
(1 + 2 + 3 + 4 + 5 + 6) / 6 = 21
In this tutorial, we’ll be using the players of a basketball team as our sample data.
Creating a Custom Mean Function
Let’s start by calculating the average (mean) age of the players in a basketball team. The team’s name will be “Pythonic Machines”.
The “pythonic_machine_ages” is a list with the ages of basketball players
We define a mean() function which returns the sum of the given dataset divided by its length
The sum() function returns the total sum (ironically) of the values of an iterable, in this case, a list. Try to pass the dataset as an argument, it’ll return 211
The len() function returns the length of an iterable, if you pass the dataset to it you’ll get 8
We pass the basketball team ages to the mean() function and print the result.
If you check the output, you’ll get:
# Because 211 / 8 = 26.375
This output represents the average age of the basketball team players. Note how the number doesn’t appear in the dataset but describes precisely the age of most players.
Using mean() from the Python Statistic Module
Calculating measures of central tendency is a common operation for most developers. That’s because Python’s statistics module provides diverse functions to calculate them, along with other basic statistics topics.
from statistics import mean
pythonic_machine_ages = [19, 22, 34, 26, 32, 30, 24, 24]
In the above code, you just need to import the mean() function from the statistics module and pass the dataset to it as an argument. This will return the same result as the custom function we defined in the previous section:
Now you have crystal clear the concept of mean let’s continue with the median measurement.
Finding the Median in Python
The median is the middle value of a sorted dataset. It is used — again — to provide a “typical” value of a determined population.
In programming, we can define the median as the value that separates a sequence into two parts — The lower half and the higher half —.
To calculate the median, first, we need to sort the dataset. We could do this with sorting algorithms or using the built-in function sorted(). The second step is to determine whether the dataset length is odd or even. Depending on this some of the following process:
Odd: The median is the middle value of the dataset
Even: The median is the sum of the two middle values divided by two
Continuing with our basketball team dataset, let’s calculate the players’ median height in centimeters:
[181, 187, 196, 196, 198, 203, 207, 211, 215]
# Since the dataset is odd, we select the middle value
median = 198
As you can see, since the dataset length is odd, so we can take the middle value as the median. However, what would happen if a player just got retired?
We would need to calculate the median taking the two middle values of the dataset
[181, 187, 196, 198, 203, 207, 211, 215]
# We select the two middle values, and divide them by 2
median = (198 + 203) / 2
median = 200.5
Creating a Custom Median Function
Let’s implement the above concept into a Python function.
Remember the three steps we need to follow to get the median of a dataset:
Sort the dataset: We can do this with the sorted() function
Determine if it’s odd or even: We can do this by getting the length of the dataset and using the modulo operator (%)
Return the median based on each case:
Odd: Return the middle value
Even: Return the average of the two middle values
That would result in the following function:
pythonic_machines_heights = [181, 187, 196, 196, 198, 203, 207, 211, 215]
after_retirement = [181, 187, 196, 198, 203, 207, 211, 215]
data = sorted(dataset)
index = len(data) // 2
# If the dataset is odd
if len(dataset) % 2 != 0:
# If the dataset is even
return (data[index - 1] + data[index]) / 2
Note how we create a data variable that points to the sorted database at the start of the function. Although the lists above are sorted, we want to create a reusable function, therefore sorting the dataset each time the function is invoked.
The index stores the middle value — or the upper-middle value — of the dataset, by using the integer division operator. For instance, if we were passing the “pythonic_machine_heights” list it would have the value of 4.
Remember that in Python sequence indexes start at zero, that’s because we’re able to return the middle index of a list, with an integer division.
Then we check if the length of the dataset is odd by comparing the result of the modulo operation with any value that isn’t zero. If the condition is true, we return the middle element, for instance, with the “pythonic_machine_heights” list:
On the other hand, if the dataset is even we return the sum of the middle values divided by two. Note that data[index -1] gives us the lower midpoint of the dataset, while data[index] supplies us with the upper midpoint.
Using median() from the Python Statistic Module
This way is much simpler because we’re using an already existent function from the statistics module.
Personally, if there is something already defined for me, I would use it because of the DRY —Don’t repeat yourself — principle (in this case, don’t repeat other’s code).
You can calculate the median of the previous datasets with the following code:
Congratulations! If you followed so far, you learned how to calculate the mean, median, and mode, the main central tendency measurements.
Although you can define your custom functions to find mean, median, and mode, it’s recommended to use the statistics module, since it’s part of the standard library and you need to install nothing to start using it.
When a user opens a website, one of the first things they notice is the header. A website header is the top section of a webpage, which contains elements such as a site’s logo, navigation menu, and functionalities such as searching and logging in.
Financial charting libraries help you to add stock market and digital asset market movement charts in any app. You will find both HTML5 charting libraries and JS libraries for app development projects.