Cleaning file system regularly manually is not good. Automate them!
Deleting files and folders manually is not an exciting task, as one may think. It makes sense to automate them.
Here comes Python to make our lives easier. Python is an excellent programming language for scripting. We are going to take advantage of Python to finish our task without any obstacle. First, you should know why Python is a good choice.
- Python is an all-time favorite language for automating tasks
- Less code compared to other programming languages
- Python is compatible with all the operating systems. You can run the same code in Windows, Linux, and Mac.
- Python has a module called
os
that helps us to interact with the operating system. We are going to use this module to complete our automation of deleting the files.
We can replace any annoying or repetitive system tasks using Python. Writing scripts for completing a specific system task is a cupcake if you know Python. Let’s look at the following use case.
Note: the following are tested on Python 3.6+
Removing files/folders older than X days
Often you don’t need old logs, and you regularly need to clean them to make storage available. It could be anything and not just logs.
We have a method called stat
in the os
module that gives details of last access (st_atime), modification (st_mtime), and metadata modification (st_ctime) time. All the methods return time in seconds since the epoch. You can find more details about the epoch here.
We will use a method called os.walk(path)
for traversing through the subfolders of a folder.
Follow the below steps to write code for the deletion files/folders based on the number of days.
- Import the modules time, os, shutil
- Set the path and days to the variables
- Convert the number of days into seconds using time.time() method
- Check whether the path exists or not using the os.path.exists(path) module
- If the path exists, then get the list of files and folders present in the path, including subfolders. Use the method os.walk(path), and it will return a generator containing folders, files, and subfolders
- Get the path of the file or folder by joining both the current path and file/folder name using the method os.path.join()
- Get the ctime from the os.stat(path) method using the attribute st_ctime
- Compare the ctime with the time we have calculated previously
- If the result is greater than the desired days of the user, then check whether it is a file or folder. If it is a file, use the os.remove(path) else use the shutil.rmtree() method
- If the path doesn’t exist, print not found message
Let’s see the code in detail.
# importing the required modules
import os
import shutil
import time
# main function
def main():
# initializing the count
deleted_folders_count = 0
deleted_files_count = 0
# specify the path
path = "/PATH_TO_DELETE"
# specify the days
days = 30
# converting days to seconds
# time.time() returns current time in seconds
seconds = time.time() - (days * 24 * 60 * 60)
# checking whether the file is present in path or not
if os.path.exists(path):
# iterating over each and every folder and file in the path
for root_folder, folders, files in os.walk(path):
# comparing the days
if seconds >= get_file_or_folder_age(root_folder):
# removing the folder
remove_folder(root_folder)
deleted_folders_count += 1 # incrementing count
# breaking after removing the root_folder
break
else:
# checking folder from the root_folder
for folder in folders:
# folder path
folder_path = os.path.join(root_folder, folder)
# comparing with the days
if seconds >= get_file_or_folder_age(folder_path):
# invoking the remove_folder function
remove_folder(folder_path)
deleted_folders_count += 1 # incrementing count
# checking the current directory files
for file in files:
# file path
file_path = os.path.join(root_folder, file)
# comparing the days
if seconds >= get_file_or_folder_age(file_path):
# invoking the remove_file function
remove_file(file_path)
deleted_files_count += 1 # incrementing count
else:
# if the path is not a directory
# comparing with the days
if seconds >= get_file_or_folder_age(path):
# invoking the file
remove_file(path)
deleted_files_count += 1 # incrementing count
else:
# file/folder is not found
print(f'"{path}" is not found')
deleted_files_count += 1 # incrementing count
print(f"Total folders deleted: {deleted_folders_count}")
print(f"Total files deleted: {deleted_files_count}")
def remove_folder(path):
# removing the folder
if not shutil.rmtree(path):
# success message
print(f"{path} is removed successfully")
else:
# failure message
print(f"Unable to delete the {path}")
def remove_file(path):
# removing the file
if not os.remove(path):
# success message
print(f"{path} is removed successfully")
else:
# failure message
print(f"Unable to delete the {path}")
def get_file_or_folder_age(path):
# getting ctime of the file/folder
# time will be in seconds
ctime = os.stat(path).st_ctime
# returning the time
return ctime
if __name__ == '__main__':
main()
You need to adjust the following two variables in the above code based on the requirement.
days = 30
path = "/PATH_TO_DELETE"
Removing files larger than X GB
Let’s search for the files that are larger than a particular size and delete them. It is similar to the above script. In the previous script, we have taken age as a parameter, and now we will take size as a parameter for the deletion.
# importing the os module
import os
# function that returns size of a file
def get_file_size(path):
# getting file size in bytes
size = os.path.getsize(path)
# returning the size of the file
return size
# function to delete a file
def remove_file(path):
# deleting the file
if not os.remove(path):
# success
print(f"{path} is deleted successfully")
else:
# error
print(f"Unable to delete the {path}")
def main():
# specify the path
path = "ENTER_PATH_HERE"
# put max size of file in MBs
size = 500
# checking whether the path exists or not
if os.path.exists(path):
# converting size to bytes
size = size * 1024 * 1024
# traversing through the subfolders
for root_folder, folders, files in os.walk(path):
# iterating over the files list
for file in files:
# getting file path
file_path = os.path.join(root_folder, file)
# checking the file size
if get_file_size(file_path) >= size:
# invoking the remove_file function
remove_file(file_path)
else:
# checking only if the path is file
if os.path.isfile(path):
# path is not a dir
# checking the file directly
if get_file_size(path) >= size:
# invoking the remove_file function
remove_file(path)
else:
# path doesn't exist
print(f"{path} doesn't exist")
if __name__ == '__main__':
main()
Adjust the following two variables.
path = "ENTER_PATH_HERE"
size = 500
Removing files with a specific extension
There might be a scenario where you want to delete files by their extension types. Let’s say .log
file. We can find the extension of a file using the os.path.splitext(path)
method. It returns a tuple containing the path and the extension of the file.
# importing os module
import os
# main function
def main():
# specify the path
path = "PATH_TO_LOOK_FOR"
# specify the extension
extension = ".log"
# checking whether the path exist or not
if os.path.exists(path):
# check whether the path is directory or not
if os.path.isdir(path):
# iterating through the subfolders
for root_folder, folders, files in os.walk(path):
# checking of the files
for file in files:
# file path
file_path = os.path.join(root_folder, file)
# extracting the extension from the filename
file_extension = os.path.splitext(file_path)[1]
# checking the file_extension
if extension == file_extension:
# deleting the file
if not os.remove(file_path):
# success message
print(f"{file_path} deleted successfully")
else:
# failure message
print(f"Unable to delete the {file_path}")
else:
# path is not a directory
print(f"{path} is not a directory")
else:
# path doen't exist
print(f"{path} doesn't exist")
if __name__ == '__main__':
# invoking main function
main()
Don’t forget to update the path and extension variable in the above code to meet your requirements.
I would suggest testing the scripts in the NON PRODUCTION environment. Once you are satisfied with the results, you can schedule through cron (if using Linux) to run it periodically for maintenance work. Python is great to achieve this stuff and if interested in learning to do more then check out this Udemy course.