• Get application security done the right way! Detect, Protect, Monitor, Accelerate, and more…
  • In this article, you’ll learn to check the size of a file or folder in Python

    Python is one of the most versatile programming languages. With it, you’ll be able to build from a small CLI (Command-line interface) program to a complex web application.

    However, one of its most underrated features is the capability to interact with operative systems. Managing OS operations with Python can save you tons of time when creating automation processes.

    Let’s see how Python interacts with the OS.

    How Python interacts with the OS?

    Python interacts with the Os with the os, sys, path and subprocess modules

    No one can live isolated from their environments. That also applies in Python, where sometimes is fundamental to interact with the operative system to get stuff done.

    Python has several modules that let us interact with the OS. The most used are os, sys, pathlib, and subprocess.

    Since they are built-in modules, you won’t need to install them with PIP. You can import all of them with the following statement:

    import os
    import sys
    import pathlib
    import subprocess

    The below list indicates the main functionality of each one of these imports:

    • Os: Portable way of using system-specific (Depending on your OS) functionality. It is the right choice in most cases unless you need something more advanced
    • Sys: System-specific parameters and functions. This module provides access to interpreter variables and functions. The os module interacts with the operative system and sys interacts with the Python interpreter
    • Pathlib: Advanced path usage. Lets you represent filesystems as objects, with the pertinent semantic for each OS.
    • Subprocess: Execution and subprocesses management directly from Python. That involves working with the  stdinstdout, and return codes. You can learn more about it by reading our Python subprocess guide.

    There are high-level libraries that include even more specific functionality depending on your needs. However, most of the time you’re good to go with the above modules.

    Note: Most of the functions provided by these modules will have a different output depending on your OS. Remember that usually, the best match is UNIX and Python.

    Now you have a quick grasp on how Python interacts with the OS, let’s jump into the methods of checking file and folder size. All of the following solutions are available in the File and folder size in the Python GitHub repository

    Using os.stat().st_size

    In this method, we’re going to use the stat() function from the os module. It returns a lot of information about a specific path.

    Note: The os.path.getsize() function also gets the job done. The advantage of using os.stat().st_size is that it doesn’t follow simlinks.

    Before continuing, let’s create a testing file named lorem.txt, in which we’re going to paste some dumb text. We can visit a Lorem Ipsum text generator and paste the text into the lorem.txt file.

    In the same directory, create a file with the name method1.py and paste the code below:

    import os
    size = os.stat('lorem.txt').st_size
    print(size)

    Let’s break down what we’re doing with this code:

    • In the first line, we’re importing the os module
    • The size variable contains the size of the file lorem.txt
      • The os.stat() function returns a bunch of info related to the file
      • The st_size attribute represents the size of the file
    • We print the size variable

    Try to run the Python script. You’ll get a different result depending on the content of your lorem.txt file.

    Output:

    20064

    The output is represented in bytes. This is not readable at all, so let’s humanize it so we can have a better perspective of the size of the file.

    First, install the humanize package, by running the following command in your shell:

    pip install humanize

    Then you can use the naturalsize() function that converts a value in bytes to readable file size, for instance, KB, MB, GB, or TB.

    import os
    from humanize import naturalsize
    
    size = os.stat('lorem.txt').st_size
    
    print(size)
    print(naturalsize(size))

    At first, the code above prints the size of the file in bytes then prints the result in a readable size.

    Output:

    20064
    20.1 kB

    Using Pathlib

    Although pathlib is designed to work exclusively with paths, it incorporates some useful functions from other modules as methods of Path objects (Instances of the Path class).

    Create a file method2.py and import the Path class.

    from pathlib import Path

    Then create a Path object passing the path to the lorem.txt file as an argument.

    file_ = Path('lorem.txt')

    Now, you can access the stat() method of the Path class. It works the same as the os.stat() function, therefore you’ll be able to print the size of the file.

    print(file_.stat().st_size)

    Output:

    20064

    As you can see, we got the same result as with the first method we used. The result above is also printed in byte format, so we can use the humanize module to make it readable.

    from pathlib import Path
    from humanize import naturalsize
    
    size = Path('lorem.txt').stat().st_size
    
    print(naturalsize(size))

    This code produces the following output:

    20.1 kB

    Using Unix commands with Subprocess:

    The subprocess module, allows us to call and manage subprocess from Python. Therefore we can run any command and treat its output directly in Python.

    Note: This method only works if you’re running a Unix OS (Linux, Mac)

    Open a file method3.py and paste the code below:

    from subprocess import run
    
    process = run(['du', 'lorem.txt'], capture_output=True, text=True)
    
    print(process.stdout)

    Diving into this piece of code:

    • We import the run function from the subprocess module
    • The variable process contains the result of running the command du lorem.txt
      • du is a Linux utility that allows us to get the disk space of a file
      • capture_output gives us access to the standout (standard output) attribute
      • text means we’re storing the output as a string instead of bytes
    • We print the standard output of the process

    If you run the code above you’ll get the following output:

    20      lorem.txt

    As you can see it’s giving us the size and the name of the file. If you only want to get the size of the file, you’ll need to split the output (remember it’s a string) and print the first element.

    from subprocess import run
    
    process = run(['du', 'lorem.txt'], capture_output=True, text=True)
    
    size = process.stdout.split()[0]
    
    print(size)

    Output:

    20

    This output isn’t readable at all. We can infer that the measurement unit used is KB (because of the previous methods), but no one else could guess the size of the file.

    To solve this problem, we can make use of the -h (human-readable) flag.

    Note: You can get a manual of this command by running man du, or du –help.

    from subprocess import run
    
    process = run(['du', '-h', 'lorem.txt'], capture_output=True, text=True)
    
    size = process.stdout.split()[0]
    
    print(size)

    Now the output of this script will be much more readable:

    20K

    If you want to know more about the subprocess module and possible applications, check out our Python subprocess guide.

    Get the Size of a Folder Recursively

    If you want to get the size of a folder, you’ll need to iterate over each file present in the directory and its sub-directories. We’ll do it with two methods:

    • Iterating over a Path with pathlib
    • Using the du command with subprocess

    The following code will be using a path to a test directory inside my home folder. You’ll need to replace the path of that file for the directory you want to get the size.

    Iterating over a Path with pathlib

    Let’s see how you can get the size of a directory by iterating over the sizes of the files.

    from pathlib import Path
    from humanize import naturalsize
    
    def get_size(path = '.'):
        size = 0
    
        for file_ in Path(path).rglob('*'):
    
            size += file_.stat().st_size
        
        return naturalsize(size)
    
    test_path = Path.home() / 'Documents/tests/'
    
    print(get_size(test_path))

    This piece of code seems a little bit scary, let’s break down what each part is doing.

    • Import the Path class and the naturalsize() function
    • Define the get_size() function with a parameter path, which points to the current directory by default.
    • The size variable is just a placeholder in which we’ll be adding the size of each file
    • Iterate over each file of the path
      • The rglob() method recursively returns the files that match the pattern
      • rglob(‘*’), means we’re getting all the files inside the directory
    • Get the size of each file and add it to the size variable
    • Returns the size variable in a human-readable way

    Of course, I’m testing out the function with a directory available only in my machine. Don’t forget to change the path to a folder that exists on your computer.

    In my case, I get the following output:

    403.4 MB

    Using the du Command with Subprocess

    This approach has some advantages:

    • The result is a little bit more accurate
    • It’s much faster
    from subprocess import run
    from pathlib import Path
    
    test_path = Path.home() / 'Documents/tests/'
    
    process = run(['du', '-sh', test_path], capture_output=True, text=True)
    
    size = process.stdout.split()[0]
    
    print(size)

    We’re using the same approach as method 3, but this time we’re getting the size of a directory instead of a file.

    Output:

    481M

    As you can see these two ways of getting the size of a folder, returns a slightly different result. The bigger the directory is the more difference you’ll get.

    It’s up to you to choose between the pathlib or the subprocess approaches. If you know you’ll be using Linux every time use subprocess, else you can use the pathlib solution.

    To sum Up

    Python results extremely handy when interacting with the OS. You can automate processes and save a lot of time with Python. The main modules to interact with the os are os, sys, path, and subprocess.

    In this tutorial you learned:

    • How Python interacts with the OS
    • The usage of built-in modules to make OS operations
    • How to use the humanize module to print human-readable
    • To calculate the size of a file with 3 approaches
    • To calculate the size of a directory recursively or with the du command