• Get application security done the right way! Detect, Protect, Monitor, Accelerate, and more…
  • Cleaning file system regularly manually is not good. Automate them!

    Deleting files and folders manually is not an exciting task, as one may think. It makes sense to automate them.

    Here comes Python to make our lives easier. Python is an excellent programming language for scripting. We are going to take advantage of Python to finish our task without any obstacle. First, you should know why Python is a good choice.

    • Python is an all-time favorite language for automating tasks
    • Less code compared to other programming languages
    • Python is compatible with all the operating systems. You can run the same code in Windows, Linux, and Mac.
    • Python has a module called os that helps us to interact with the operating system. We are going to use this module to complete our automation of deleting the files.

    We can replace any annoying or repetitive system tasks using Python. Writing scripts for completing a specific system task is a cupcake if you know Python. Let’s look at the following use case.

    Note: the following are tested on Python 3.6+

    Removing files/folders older than X days

    Often you don’t need old logs, and you regularly need to clean them to make storage available. It could be anything and not just logs.

    We have a method called stat in the os module that gives details of last access (st_atime), modification (st_mtime), and metadata modification (st_ctime) time. All the methods return time in seconds since the epoch. You can find more details about the epoch here.

    We will use a method called os.walk(path) for traversing through the subfolders of a folder.

    Follow the below steps to write code for the deletion files/folders based on the number of days.

    • Import the modules time, os, shutil
    • Set the path and days to the variables
    • Convert the number of days into seconds using time.time() method
    • Check whether the path exists or not using the os.path.exists(path) module
    • If the path exists, then get the list of files and folders present in the path, including subfolders. Use the method os.walk(path), and it will return a generator containing folders, files, and subfolders
    • Get the path of the file or folder by joining both the current path and file/folder name using the method os.path.join()
    • Get the ctime from the os.stat(path) method using the attribute st_ctime
    • Compare the ctime with the time we have calculated previously
    • If the result is greater than the desired days of the user, then check whether it is a file or folder. If it is a file, use the os.remove(path) else use the shutil.rmtree() method
    • If the path doesn’t exist, print not found message

    Let’s see the code in detail.

    # importing the required modules
    import os
    import shutil
    import time
    
    # main function
    def main():
    
    	# initializing the count
    	deleted_folders_count = 0
    	deleted_files_count = 0
    
    	# specify the path
    	path = "/PATH_TO_DELETE"
    
    	# specify the days
    	days = 30
    
    	# converting days to seconds
    	# time.time() returns current time in seconds
    	seconds = time.time() - (days * 24 * 60 * 60)
    
    	# checking whether the file is present in path or not
    	if os.path.exists(path):
    		
    		# iterating over each and every folder and file in the path
    		for root_folder, folders, files in os.walk(path):
    
    			# comparing the days
    			if seconds >= get_file_or_folder_age(root_folder):
    
    				# removing the folder
    				remove_folder(root_folder)
    				deleted_folders_count += 1 # incrementing count
    
    				# breaking after removing the root_folder
    				break
    
    			else:
    
    				# checking folder from the root_folder
    				for folder in folders:
    
    					# folder path
    					folder_path = os.path.join(root_folder, folder)
    
    					# comparing with the days
    					if seconds >= get_file_or_folder_age(folder_path):
    
    						# invoking the remove_folder function
    						remove_folder(folder_path)
    						deleted_folders_count += 1 # incrementing count
    
    
    				# checking the current directory files
    				for file in files:
    
    					# file path
    					file_path = os.path.join(root_folder, file)
    
    					# comparing the days
    					if seconds >= get_file_or_folder_age(file_path):
    
    						# invoking the remove_file function
    						remove_file(file_path)
    						deleted_files_count += 1 # incrementing count
    
    		else:
    
    			# if the path is not a directory
    			# comparing with the days
    			if seconds >= get_file_or_folder_age(path):
    
    				# invoking the file
    				remove_file(path)
    				deleted_files_count += 1 # incrementing count
    
    	else:
    
    		# file/folder is not found
    		print(f'"{path}" is not found')
    		deleted_files_count += 1 # incrementing count
    
    	print(f"Total folders deleted: {deleted_folders_count}")
    	print(f"Total files deleted: {deleted_files_count}")
    
    
    def remove_folder(path):
    
    	# removing the folder
    	if not shutil.rmtree(path):
    
    		# success message
    		print(f"{path} is removed successfully")
    
    	else:
    
    		# failure message
    		print(f"Unable to delete the {path}")
    
    
    
    def remove_file(path):
    
    	# removing the file
    	if not os.remove(path):
    
    		# success message
    		print(f"{path} is removed successfully")
    
    	else:
    
    		# failure message
    		print(f"Unable to delete the {path}")
    
    
    def get_file_or_folder_age(path):
    
    	# getting ctime of the file/folder
    	# time will be in seconds
    	ctime = os.stat(path).st_ctime
    
    	# returning the time
    	return ctime
    
    
    if __name__ == '__main__':
    	main()

    You need to adjust the following two variables in the above code based on the requirement.

    days = 30 
    path = "/PATH_TO_DELETE"

    Removing files larger than X GB

    Let’s search for the files that are larger than a particular size and delete them. It is similar to the above script. In the previous script, we have taken age as a parameter, and now we will take size as a parameter for the deletion.

    # importing the os module
    import os
    
    # function that returns size of a file
    def get_file_size(path):
    
    	# getting file size in bytes
    	size = os.path.getsize(path)
    
    	# returning the size of the file
    	return size
    
    
    # function to delete a file
    def remove_file(path):
    
    	# deleting the file
    	if not os.remove(path):
    
    		# success
    		print(f"{path} is deleted successfully")
    
    	else:
    
    		# error
    		print(f"Unable to delete the {path}")
    
    
    def main():
    	# specify the path
    	path = "ENTER_PATH_HERE"
    
    	# put max size of file in MBs
    	size = 500
    
    	# checking whether the path exists or not
    	if os.path.exists(path):
    
    		# converting size to bytes
    		size = size * 1024 * 1024
    
    		# traversing through the subfolders
    		for root_folder, folders, files in os.walk(path):
    
    			# iterating over the files list
    			for file in files:
    				
    				# getting file path
    				file_path = os.path.join(root_folder, file)
    
    				# checking the file size
    				if get_file_size(file_path) >= size:
    					# invoking the remove_file function
    					remove_file(file_path)
    			
    		else:
    
    			# checking only if the path is file
    			if os.path.isfile(path):
    				# path is not a dir
    				# checking the file directly
    				if get_file_size(path) >= size:
    					# invoking the remove_file function
    					remove_file(path)
    
    
    	else:
    
    		# path doesn't exist
    		print(f"{path} doesn't exist")
    
    if __name__ == '__main__':
    	main()

    Adjust the following two variables.

    path = "ENTER_PATH_HERE" 
    size = 500

    Removing files with a specific extension

    There might be a scenario where you want to delete files by their extension types. Let’s say .log file. We can find the extension of a file using the os.path.splitext(path) method. It returns a tuple containing the path and the extension of the file.

    # importing os module
    import os
    
    # main function
    def main():
        
        # specify the path
        path = "PATH_TO_LOOK_FOR"
        
        # specify the extension
        extension = ".log"
        
        # checking whether the path exist or not
        if os.path.exists(path):
            
            # check whether the path is directory or not
            if os.path.isdir(path):
            
                # iterating through the subfolders
                for root_folder, folders, files in os.walk(path):
                    
                    # checking of the files
                    for file in files:
    
                        # file path
                        file_path = os.path.join(root_folder, file)
    
                        # extracting the extension from the filename
                        file_extension = os.path.splitext(file_path)[1]
    
                        # checking the file_extension
                        if extension == file_extension:
                            
                            # deleting the file
                            if not os.remove(file_path):
                                
                                # success message
                                print(f"{file_path} deleted successfully")
                                
                            else:
                                
                                # failure message
                                print(f"Unable to delete the {file_path}")
            
            else:
                
                # path is not a directory
                print(f"{path} is not a directory")
        
        else:
            
            # path doen't exist
            print(f"{path} doesn't exist")
    
    if __name__ == '__main__':
        # invoking main function
        main()

    Don’t forget to update the path and extension variable in the above code to meet your requirements.

    I would suggest testing the scripts in the NON PRODUCTION environment. Once you are satisfied with the results, you can schedule through cron (if using Linux) to run it periodically for maintenance work. Python is great to achieve this stuff and if interested in learning to do more then check out this Udemy course.