Coding — Unit 4#

Compressing Files in Python#

To compress a file in Python, you can use the zipfile module, which provides tools for creating, reading, writing, and extracting ZIP files. Here’s a simple example to demonstrate how to compress a file using this module:

Compressing a File#

import zipfile
import os

def compress_file(file_path, output_path):
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        zipf.write(file_path, os.path.basename(file_path))

# Example usage
file_to_compress = 'example.txt'  # Replace with your file path
compressed_file = 'example.zip'   # Replace with your desired output file path

compress_file(file_to_compress, compressed_file)
print(f"File {file_to_compress} compressed to {compressed_file}")

In this example, example.txt is the file you want to compress, and example.zip is the output ZIP file. The os.path.basename(file_path) function ensures that only the file name is included in the ZIP archive, not the full path.

Compressing Multiple Files#

If you want to compress multiple files into a single ZIP archive, you can modify the function to accept a list of files:

import zipfile

def compress_files(file_paths, output_path):
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for file_path in file_paths:
            zipf.write(file_path, os.path.basename(file_path))

# Example usage
files_to_compress = ['example1.txt', 'example2.txt']  # Replace with your file paths
compressed_file = 'examples.zip'   # Replace with your desired output file path

compress_files(files_to_compress, compressed_file)
print(f"Files {files_to_compress} compressed to {compressed_file}")

Compressing a Directory#

To compress all files in a directory, you can walk through the directory and add each file to the ZIP archive:

import zipfile
import os

def compress_directory(directory_path, output_path):
    with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(directory_path):
            for file in files:
                file_path = os.path.join(root, file)
                zipf.write(file_path, os.path.relpath(file_path, directory_path))

# Example usage
directory_to_compress = 'example_directory'  # Replace with your directory path
compressed_file = 'example_directory.zip'    # Replace with your desired output file path

compress_directory(directory_to_compress, compressed_file)
print(f"Directory {directory_to_compress} compressed to {compressed_file}")

In this script, os.walk(directory_path) is used to iterate through all files in the specified directory and its subdirectories. os.path.relpath(file_path, directory_path) ensures that the directory structure is preserved in the ZIP archive.

Encrypting files in Python#

To encrypt files in Python, you can use the cryptography library, which provides a simple and secure way to encrypt and decrypt files.

Install the cryptography library#

First, you need to install the library. You can do this using pip:

pip install cryptography

Encrypting a File#

Here’s a Python script that demonstrates how to encrypt a file using the cryptography library:

from cryptography.fernet import Fernet

# Generate a key for encryption
def generate_key():
    key = Fernet.generate_key()
    with open('secret.key', 'wb') as key_file:
        key_file.write(key)

# Load the encryption key
def load_key():
    return open('secret.key', 'rb').read()

# Encrypt a file
def encrypt_file(file_path):
    key = load_key()
    fernet = Fernet(key)

    with open(file_path, 'rb') as file:
        original_data = file.read()

    encrypted_data = fernet.encrypt(original_data)

    with open(file_path + '.enc', 'wb') as encrypted_file:
        encrypted_file.write(encrypted_data)

# Example usage
generate_key()  # Run this only once to generate and save the key
file_to_encrypt = 'example.txt'  # Replace with your file path

encrypt_file(file_to_encrypt)
print(f"File {file_to_encrypt} encrypted.")

In this script:

  • generate_key() generates and saves an encryption key to a file named secret.key.

  • load_key() loads the encryption key from the file.

  • encrypt_file() encrypts the specified file using the key.

Decrypting a File#

To decrypt the file, you can use the following script:

from cryptography.fernet import Fernet

# Load the encryption key
def load_key():
    return open('secret.key', 'rb').read()

# Decrypt a file
def decrypt_file(file_path):
    key = load_key()
    fernet = Fernet(key)

    with open(file_path, 'rb') as encrypted_file:
        encrypted_data = encrypted_file.read()

    decrypted_data = fernet.decrypt(encrypted_data)

    with open(file_path.replace('.enc', ''), 'wb') as decrypted_file:
        decrypted_file.write(decrypted_data)

# Example usage
file_to_decrypt = 'example.txt.enc'  # Replace with your encrypted file path

decrypt_file(file_to_decrypt)
print(f"File {file_to_decrypt} decrypted.")

In this script:

  • load_key() loads the encryption key from the file.

  • decrypt_file() decrypts the specified file using the key and saves the decrypted data.

Important Notes:#

  • Key Management: The encryption key (secret.key) must be kept secure. If it is lost, you won’t be able to decrypt your files. If it is stolen, anyone with the key can decrypt your files.

  • File Extensions: The encrypted file is saved with an .enc extension to differentiate it from the original file. You can change this as needed.

Hashing files in Python#

Hashing a file in Python can be done using the hashlib module, which provides various hash functions such as SHA-256, MD5, and others.

Hashing a File#

Here’s a Python script that demonstrates how to hash a file using SHA-256:

import hashlib

def hash_file(file_path):
    # Create a SHA-256 hash object
    sha256_hash = hashlib.sha256()
    
    # Open the file in binary mode
    with open(file_path, 'rb') as file:
        # Read the file in chunks to handle large files
        for byte_block in iter(lambda: file.read(4096), b''):
            sha256_hash.update(byte_block)
    
    # Return the hexadecimal digest of the hash
    return sha256_hash.hexdigest()

# Example usage
file_to_hash = 'example.txt'  # Replace with your file path

file_hash = hash_file(file_to_hash)
print(f"The SHA-256 hash of the file is: {file_hash}")

In this script:

  • hashlib.sha256() creates a SHA-256 hash object.

  • The file is opened in binary mode ('rb') to ensure it works with all types of files, including text and binary files.

  • The file is read in chunks of 4096 bytes to handle large files efficiently.

  • The update() method is called for each chunk to update the hash object.

  • The hexdigest() method returns the hash value as a hexadecimal string.

Hashing with Other Algorithms#

You can easily switch to other hash algorithms provided by hashlib (e.g., MD5, SHA-1) by replacing hashlib.sha256() with the desired hash function:

import hashlib

def hash_file(file_path, algorithm='sha256'):
    # Create a hash object based on the chosen algorithm
    hash_obj = hashlib.new(algorithm)
    
    # Open the file in binary mode
    with open(file_path, 'rb') as file:
        # Read the file in chunks to handle large files
        for byte_block in iter(lambda: file.read(4096), b''):
            hash_obj.update(byte_block)
    
    # Return the hexadecimal digest of the hash
    return hash_obj.hexdigest()

# Example usage
file_to_hash = 'example.txt'  # Replace with your file path

# Hash the file using SHA-256
file_hash_sha256 = hash_file(file_to_hash, 'sha256')
print(f"The SHA-256 hash of the file is: {file_hash_sha256}")

# Hash the file using MD5 (for example, though it's less secure)
file_hash_md5 = hash_file(file_to_hash, 'md5')
print(f"The MD5 hash of the file is: {file_hash_md5}")

This script allows you to specify the hash algorithm you want to use. Just pass the desired algorithm name (e.g., 'sha256', 'md5', 'sha1') to the hash_file function.

Important Notes:#

  • Security Considerations: Some hash algorithms like MD5 and SHA-1 are considered weak and should not be used for security-critical applications. SHA-256 and SHA-3 are generally recommended for secure hashing.

  • File Integrity: Hashing is commonly used to verify file integrity. By comparing the hash of a downloaded file with a known hash value, you can ensure the file has not been tampered with.