Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data Security Management 📝

Learning Goals

By the end of this section you will:

  • understand the principles of cybersecurity and why it is important

  • understand the the basic principles of encryption and why it is important

  • understand the process steps of traditional ciphers and be able to execute them

  • understand how modern symmetric encryption works

  • understand how modern asymmetric encryption works

  • understand the processes of hashing and when it should be used

  • understand the two types of compression ands why they are important

Data has never been more valuable. It is an increasingly important driver of growth in our modern economy. Whether it is data about individuals, businesses or government, data underpins how we communicate, conduct business, and receive services. At the same time it can be stolen, manipulated or used as a weapon by foreign adversaries and criminals departmentofhomeaffairs_2023_data.

Data security is crucial because it safeguards sensitive information from unauthorized access, ensuring that personal, financial, and confidential data remains private and protected. In today’s digital age, where vast amounts of data are shared and stored online, the risk of data breaches and cyberattacks is significant. These breaches can result in identity theft, financial loss, reputational damage, and legal consequences.

Data breachers far more common than most people believe. The Office of the Australian Information Commissioner, publishes regular reports on breaches involving Australian organisations. Check the latest report to see the extent of data breaching occurring. Note that these are only the breaches that Australian Law requires organisations to report.

Data security measures, including encryption, access controls, and regular security audits, help prevent such incidents and ensure that individuals and organizations can trust that their data is safe and confidential. It’s not only about protecting information but also about maintaining trust, compliance with regulations, and upholding the integrity of data in an increasingly interconnected world.

The following techniques used to manage data effectively, ensuring it can be stored and transferred securely and efficiently.

The CIA Triad: Core Principles of Cybersecurity 📝

CIA Triad

The CIA Triad stands for Confidentiality, Integrity, and Availability. It’s a model that helps us understand how to protect information and systems from threats. Every good security strategy aims to balance these three goals.

Confidentiality – Keeping Information Private

Confidentiality means only authorized people can access sensitive data. It prevents data from being read or stolen by people who shouldn’t see it.

Key Concepts:

Examples:

Integrity – Keeping Information Accurate and Trustworthy

Integrity means data should stay correct, complete, and unaltered unless changed in an authorized way. It protects against tampering, accidental deletion, or corruption.

Key Concepts:

Examples:

Availability – Keeping Systems and Data Usable

Availability ensures that systems and data are accessible when users need them. Attacks that stop people from using services (like websites or apps) are attacks on availability.

Key Concepts:

Examples:

Why It Matters

If any part of the CIA Triad is compromised, security is at risk:

Attack TypeAffects...Example
Data theft (e.g. a data breach)ConfidentialityA hacker steals customer info from a database
Data tamperingIntegrityA virus changes invoice numbers in a billing system
Denial of Service (DoS/DDoS) attackAvailabilityA website goes offline due to a flood of fake traffic

Encryption 📝

Encryption is the process of converting data into a coded format to prevent unauthorized access. It ensures that only authorized parties can read the information.

How Encryption works

Why Encryption important

Data encryption is necessary for several important reasons:

Types of Encryption


Traditional Ciphers 📝

Caesar cipher 📝

The Caesar cipher is one of the simplest and most well-known encryption techniques. It is a type of substitution cipher where each letter in the plaintext is shifted a fixed number of places down or up the alphabet. It is named after Julius Caesar, who reportedly used it to protect his private correspondence.

How the Caesar Cipher Works

  1. Choose a Shift Value: Decide on the number of positions each letter will be shifted. For example, with a shift of 3:

    • A becomes D

    • B becomes E

    • C becomes F

    • ..., and so on.

  2. Encrypt the Plaintext: Replace each letter in the plaintext with the letter that appears a fixed number of positions down the alphabet.

    • For example, with a shift of 3, the word “HELLO” becomes “KHOOR”.

  3. Decrypt the Ciphertext: To decrypt the message, shift the letters in the opposite direction by the same number of positions.

    • “KHOOR” with a shift of 3 back becomes “HELLO”.

    Caesar Example

Caesar Encryption Pseudocode

FUNCTION encrypt_caesar(plaintext, shift):
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""

    FOR each character 'char' in 'plaintext':
        IF char IS a letter:
            char = UPPERCASE(char)
            position = POSITION(char IN alphabet)
            new_position = (position + shift) MOD 26
            result = result + alphabet[new_position]
        ELSE:
            result = result + char

    RETURN result

Caesar Decryption Pseudocode

FUNCTION decrypt_caesar(ciphertext, shift):
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""

    FOR each character 'char' in 'ciphertext':
        IF char IS a letter:
            char = UPPERCASE(char)
            position = POSITION(char IN alphabet)
            new_position = (position - shift + 26) MOD 26
            result = result + alphabet[new_position]
        ELSE:
            result = result + char

    RETURN result

Caesar Characteristics and Security

Vigenère Cipher 📝

The Vigenère cipher is a method of encrypting alphabetic text by using a simple form of polyalphabetic substitution. It employs a keyword to determine the shift applied to each letter of the plaintext, making it more secure than the Caesar cipher.

How the Vigenère Cipher Works

  1. Choose a Keyword: A keyword is selected, and each letter of the keyword is used to create a different Caesar cipher shift.

    • For example, if the keyword is “KEY”, it corresponds to shifts of K=10, E=4, and Y=24.

  2. Repeat the Keyword: The keyword is repeated to match the length of the plaintext.

    • For example, if the plaintext is “ATTACKATDAWN” and the keyword is “KEY”, the repeated keyword is “KEYKEYKEYKEY”.

  3. Encrypt the Plaintext: Each letter of the plaintext is shifted according to the corresponding letter of the keyword. The shift is determined by converting the keyword letter into a number (A=0, B=1, ..., Z=25).

  4. Decrypt the Cyphertext: To decrypt the message, the same keyword is used. Each letter of the ciphertext is shifted back according to the corresponding letter of the keyword.

    Vigenère Example

Vigenère Encryption Pseudocode

FUNCTION encrypt_vigenere(plaintext, keyword):
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""
    keyword_length = LENGTH(keyword)

    FOR i FROM 0 TO LENGTH(plaintext) - 1:
        char = plaintext[i]

        IF char IS a letter:
            char = UPPERCASE(char)
            shift_char = UPPERCASE(keyword[i % keyword_length])
            shift = POSITION(shift_char IN alphabet)
            char_position = POSITION(char IN alphabet)
            new_position = (char_position + shift) MOD 26
            result += alphabet[new_position]
        ELSE:
            result += char

    RETURN result

Vigenère Decryption Pseudocode

FUNCTION decrypt_vigenere(ciphertext, keyword):
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    result = ""
    keyword_length = LENGTH(keyword)

    FOR i FROM 0 TO LENGTH(ciphertext) - 1:
        char = ciphertext[i]

        IF char IS a letter:
            char = UPPERCASE(char)
            shift_char = UPPERCASE(keyword[i % keyword_length])
            shift = POSITION(shift_char IN alphabet)
            char_position = POSITION(char IN alphabet)
            new_position = (char_position - shift + 26) MOD 26
            result += alphabet[new_position]
        ELSE:
            result += char

    RETURN result

Vigenère Characteristics and Security

Gronsfeld Cipher

The Gronsfeld cipher is a variant of the Vigenère cipher that uses a numeric key instead of a keyword to shift the letters of the plaintext. It shares many similarities with the Vigenère cipher but simplifies the key by restricting it to numeric digits, making it somewhat easier to use and remember.

One-time Pad Encryption 📝

The one-time pad (OTP) encryption is a theoretically unbreakable encryption technique that involves a random key that is as long as the message being encrypted. It was first described by Frank Miller in 1882 and later formalized by Gilbert Vernam in 1917.

one time pad

How OTP Encryption Works

Characteristics and Security

One-Time-Pad Encryption

FUNCTION one_time_pad_encrypt(plain_text, key)
    SET alphabet TO "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    SET cipher_text TO ""

    SET cleaned_plain_text TO UPPER(plain_text)
    SET cleaned_key TO UPPER(key)

    IF LENGTH(cleaned_key) < LENGTH(cleaned_plain_text) THEN
        OUTPUT "error: key is shorter than plaintext."
        RETURN ""
    END IF

    FOR position FROM 0 TO LENGTH(cleaned_plain_text) - 1
        SET plain_position TO INDEX(alphabet, cleaned_plain_text[position])
        SET key_position TO INDEX(alphabet, cleaned_key[position])
        SET cipher_position TO (plain_position + key_position) MOD 26
        SET cipher_text TO cipher_text & alphabet[cipher_position]
    ENDFOR

    RETURN cipher_text
END FUNCTION

// Main program
INPUT "Enter plaintext: " plain_text
INPUT "Enter key (at least as long as plaintext): " key
SET encrypted_text TO one_time_pad_encrypt(plain_text, key)
OUTPUT "Ciphertext: " & encrypted_text

One-Time-Pad Decryption

FUNCTION one_time_pad_decrypt(cipher_text, key)
    SET alphabet TO "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    SET plain_text TO ""

    SET cleaned_cipher_text TO UPPER(cipher_text)
    SET cleaned_key TO UPPER(key)

    IF LENGTH(cleaned_key) < LENGTH(cleaned_cipher_text) THEN
        OUTPUT "error: key is shorter than ciphertext."
        RETURN ""
    END IF

    FOR position FROM 0 TO LENGTH(cleaned_cipher_text) - 1
        SET cipher_position TO INDEX(alphabet, cleaned_cipher_text[position])
        SET key_position TO INDEX(alphabet, cleaned_key[position])
        SET plain_position TO (cipher_position - key_position + 26) MOD 26
        SET plain_text TO plain_text & alphabet[plain_position]
    ENDFOR

    RETURN plain_text
END FUNCTION


// Main program
INPUT "Enter ciphertext: " cipher_text
INPUT "Enter key (at least as long as ciphertext): " key

SET decrypted_text TO one_time_pad_decrypt(cipher_text, key)
OUTPUT "Decrypted text: " & decrypted_text

Symmetric Encryption 📝

Symmetric encryption is a type of encryption where only one key (a secret key) is used to both encrypt and decrypt electronic information. The entities communicating via symmetric encryption must exchange the key so that it can be used in the decryption process.

By using symmetric encryption algorithms, data is converted to a form that cannot be understood by anyone who does not possess the secret key to decrypt it. Once the intended recipient who possesses the key has the message, the algorithm reverses its action so that the message is returned to its original and understandable form. The secret key that the sender and recipient both use could be a specific password/code or it can be random string of letters or numbers that have been generated by a secure random number generator.

There are two types of symmetric encryption algorithms:

  1. Block algorithms: Set lengths of bits are encrypted in blocks of electronic data with the use of a specific secret key. As the data is being encrypted, the system holds the data in its memory as it waits for complete blocks.

  2. Stream algorithms: Data is encrypted as it streams instead of being retained in the system’s memory.

Symmetric Encryption Uses

While symmetric encryption is an older method of encryption, it is faster and more efficient than asymmetric encryption. Symmetric cryptography is typically used for encrypting large amounts of data, e.g. for database encryption. In the case of a database, the secret key might only be available to the database itself to encrypt or decrypt.

Some examples of where symmetric cryptography is used are:

Block Ciphers

A block cipher takes a block of plaintext bits and generates a block of ciphertext bits, generally of same size. The size of block is fixed in the given scheme. The choice of block size does not directly affect to the strength of encryption scheme. The strength of cipher depends up on the key length.

Feistel Block Cipher

A Feistel block cipher is not an encryption method in itself. It is a model or process that is used by many block ciphers to encode the plain text into a cipher text. A Feistel cipher requires a key and a set length portion of the plain text and occurs over a designated number of rounds.

Data Encryption Standard (DES)

DES is a symmetric key algorithm developed in the 1970s by IBM and later adopted by the U.S. government as a standard. It uses a 56-bit key to encrypt 64-bit blocks of data. DES operates on data using 16 rounds of Feistel network, which involves permutations, substitutions, and XOR operations.

DES is now considered insecure for many applications due to its relatively short key length, which makes it vulnerable to brute-force attacks.

Triple DES (3DES)

Triple DES enhances the security of DES by applying the DES algorithm three times to each data block. It uses three 56-bit keys, effectively giving it a 168-bit key length, though due to some weaknesses, its effective security is closer to 112 bits.

Despite its increased security, 3DES is slower than other modern symmetric algorithms.

3DES

Advanced Encryption Standard (AES)

AES is the current U.S. federal standard for encryption, replacing DES and 3DES. It supports key sizes of 128, 192, and 256 bits and operates on 128-bit blocks of data. AES uses a substitution-permutation network with 10, 12, or 14 rounds depending on the key size.

AES is widely regarded as secure and is used globally in various applications.

Blowfish

Blowfish is a symmetric key block cipher designed by Bruce Schneier in 1993. It has a variable key length from 32 bits to 448 bits, making it highly flexible. Blowfish operates on 64-bit blocks and is known for its speed and effectiveness in hardware applications. It uses a Feistel network with 16 rounds of processing.

While secure, Blowfish’s 64-bit block size is now considered a limitation for certain applications.

Twofish

Twofish is a symmetric key block cipher, also designed by Bruce Schneier as a successor to Blowfish. It operates on 128-bit blocks and supports key sizes up to 256 bits. Twofish uses a Feistel network and is optimized for hardware and software performance.

Twofish is known for its flexibility, speed, and security, making it a strong contender in the AES competition.


Asymmetric Encryption 📝

The biggest weakness of symmetric encryption is the need for key distribution. That is, there needs to be a secure way for all parties to agree upon the key that will be used for encryption and decryption. This is very effective, if you wish to encrypt data on your hard-drive, or in a database on your server. Unfortunately symmetric encryption is not practical when it comes to online communication, for this we need asymmetric encryption.

Asymmetric communication refers to a method of communication in which the parties involved use different keys for encryption and decryption processes. This is a key feature of asymmetric encryption, also known as public-key cryptography.

Key Components of Asymmetric Communication

The public key is openly distributed and used to encrypt data. Anyone can use this key to encrypt a message intended for a specific recipient. For example, if Alice wants to send a secure message to Bob, she would use Bob’s public key to encrypt the message.

The private key is kept secret and is used to decrypt data that has been encrypted with the corresponding public key. Continuing the example, Bob would use his private key to decrypt the message sent by Alice.

How Asymmetric Communication Works

  1. The sender (Alice) and the receiver (Bob) agree and communicate the public parameters (prime modulus and a generator).

  2. Alice then chooses a private key and calculates a public key, which she share publicly with Bob.

  3. Bob selects his own private key, calculates a public key, and shares it publicly with Alice.

  4. Alice takes Bob’s public key and raises it to the power of her private key to compute a shared key.

  5. Bob does the same with Alice’s public key and his private key to also establish the shared key.

  6. Alice and Bob now have a shared key they can use for symmetrical encryption.

  7. Without knowing Alice’s or Bob’s private keys Eve cannot derive the shared key.

Advantages of Asymmetric Communication

Asymmetric communication enhances security because the private key is never shared and remains secure with the owner. Even if the public key is widely distributed, only the intended recipient can decrypt the message.

Asymmetric encryption also simplifies key distribution. Unlike symmetric encryption, where the same key must be securely shared between parties, the public key can be freely distributed without compromising security.

Disadvantages of Asymmetric Communication

Asymmetric encryption algorithms are generally slower than symmetric algorithms, making them less suitable for encrypting large amounts of data directly.

The process of managing public and private keys and ensuring their integrity also adds complexity to the system.

Applications of Asymmetric Communication

Try Asymmetric Communication

Work in threes and take the roles of Alice, Bob and Eve.

  1. Alice and Bob agree and publicly share a prime modulus

  2. Alice and Bob agree and publicly share a generator

  3. Alice and Bob both choose a private key and keep this to themselves.

  4. Exchange of public keys:

    • Alice and Bob perform the following calculation (in Python) and share the results publicly:

public_key = generator ** private_key % prime_modulus
  1. Alice and Bob now work out the shared secret key

    • Use the following calculations (in Python):

# for Alice
shared_key = bob_pub_key ** alice_private_key % prime_modulus

# for Bob
shared_key = alice_pub_key ** bob_private_key % prime_modulus
  1. Alice and Bob now use the shared key to send messages using symmetric encryption to each other, and Eve attempts to work out the message.

RSA

RSA is one of the most widely used asymmetric encryption algorithms, developed by Ron Rivest, Adi Shamir, and Leonard Adleman in 1977. It relies on the mathematical difficulty of factoring large prime numbers.

RSA uses two keys a public key for encryption and a private key for decryption. Key lengths typically range from 2048 to 4096 bits, providing strong security. It is commonly used in secure data transmission, digital signatures, and key exchange protocols.

RSA is computationally intensive and slower compared to symmetric algorithms, making it less suitable for encrypting large amounts of data directly.


Hashing 📝

Hashing is the process of converting data into a fixed-size string of characters, which typically looks like a random sequence of letters and numbers. This is done using a specific algorithm.

How Hashing works

Why Hashing important


Data Compression 📝

Data compression is the process of reducing the size of a file or data set. This means the data takes up less space on storage devices and can be transmitted more quickly over networks.

How Data Compression works

Why Data Compression important


Check Sums 📝

A checksum is a value used to check whether data has been altered during storage or transmission.

Checksums are commonly used in file downloads, networking, and data storage to ensure integrity of data.

Here’s a very simple checksum example with numbers:

StepData
Sending block of data12, 7, 5, 20
Add the numbers together12 + 7 + 5 + 20 = 44
Take the last digit only (mod 10):44 → checksum = 4
Send data with checksum:Data = [12, 7, 5, 20], Checksum = 4
Receiver adds the numbers again:12 + 7 + 5 + 20 = 44 → checksum = 4
Compare with the sent checksum:If both are 4 → data is correct
If not → data is corrupted

Here’s the same process, but with an error:

StepData
Original data:12, 7, 5, 20
Sender’s checksum:12 + 7 + 5 + 20 = 44 → checksum = 4
Data is transmitted, but one number changes (error):12, 7, 6, 20
Receiver calculates checksum:12 + 7 + 6 + 20 = 45 → checksum = 5
Compare checksums:Sent checksum = 4
Received checksum = 5