Homework 5: Encryption

Assigned
  • February 28, 2024
Due
  • March 4, 2024 11:59pm
Collaboration
    Regular policies for collaboration apply to this homework assignment. Make sure you review and understand the policies on the syllabus before you discuss your work on this assignment with anyone else.
Submitting
    Submit your completed caesarcipher.c file, a working Makefile, and tests.txt to Gradescope under the corresponding assignment. You may submit as many times as you like up until the deadline.

Overview

For this assignment, you will implement a classic (but not recommended!!) encryption technique. Classical encryption is based on the principle of substituting letters for other letters. These may be simple schemes like the ones we’ll implement in this homework or more complicated schemes such as the ones employed in the Enigma machine. The Enigma machine was used by Nazi Germany during World War II and was subsequently broken by the Allies, in particular the researchers at the British Government Code and Cypher School, notably Alan Turing who led the development of the Bombe machines that were used to break the Enigma machines.

Encryption Preliminaries

The theory of encryption is a cornerstone of cryptography and computer security where we make our communication and/or data secure from adversaries.

Before the advent of computers, classical cryptography used various transposition and substitution ciphers to encrypt data. Here are some basic definitions you should know before proceeding.

  • A cipher is a pair of algorithms for encrypting and decrypting data.
  • Plaintext is un-encrypted data.
  • Ciphertext is encrypted data.
  • A transposition cipher creates ciphertext from plaintext by re-arranging the letters in the text in some pre-determined fashion.
  • A substitution cipher creates ciphertext from plaintext by substituting one letter for another letter.

Ciphers in classical cryptography were designed to be executed by hand with analog tools. For example, the Solitaire Cipher was designed to be hand-executed by secret agents in the field with nothing more than a deck of cards. More elaborate encryption schemes required the use of mechanical computing devices that became the forefathers of the modern-day digital computers, the Enigma machine described in the introduction being the quintessential example. However, in the age of digital computers, classical cryptography methods are no longer useful for most practical purposes because a computer can either brute-force through these encryption schemes, or otherwise greatly assist an educated individual in breaking the code.

To implement these classical cryptographic schemes, we need to understand how to map the mathematical models that underlie them into C. Luckily this is relatively straightforward. For sake of simplicity, let’s assume that we’re only working with lowercase English letters and that each letter is assigned a number—from ‘a’ represented by 0 to ‘z’ represented by 25—called the character code of that letter. Given a single letter ch and a single letter key to encrypt the letter with:

  • To encrypt ch with key, “add” key to ch. That is, add their corresponding character codes, and the result is the character code of the corresponding encrypted letter. If you go over 25, wrap around. For example, If we encrypt c with j, then we get the letter l because the code for c is 2 and the code for j is 9. Thus, 2 + 9 = 11 which is the code for the letter l. If we encrypt x with e, then we get b because the code for x is 23 and the code for e is 4. 23 + 4 = 27 but since 27 isn’t a valid code, we wrap around and get 1 which is the code for b.
  • To decrypt ch with key, “subtract” key from ch. In the case when the difference is negative, we wrap around like with addition.

By thinking of characters as numbers, we can formalize this style of encryption as well as directly implement it in C. With character values, encryption can be thought of the formula:

E = (ch + key) mod 26. 

Decryption is described by the formula:

D = (ch - key) mod 26. 

Mod here is almost the % operator, but not quite. The problem is that negative integers do not “wrap around” like you expect, e.g., -2 % 26 is -2 rather than 24. You will need to do a little bit of extra work to obtain the desired behavior in this case.

To get the character value of a single character, we can convert it to an integer by casting to the appropriate type:

int ch = (int) 'e'; // ch contains 101 

However, we said that the character code for e is 4. What is going on here? It turns out that we assign character codes to not just lowercase letters but to all possible letters. Imagine putting all the possible letters on a line. The lowercase letters occupy indices 97 through 122 on that line:

 int ch1 = (int) 'a'; // ch1 contains 97
 int ch2 = (int) 'z'; // ch2 contains 122 

To “re-base” these numbers at index zero, we simply need to subtract the character value of a.

 int base = (int) 'a';
 int b1 = 'a' - base; // b1 contains 0
 int b2 = 'z' - base; // b2 contains 25 

When we want to get a letter back given a computed character value in the range 0-25, we simply reverse the process by adding back in (int) ‘a’ and then casting back to char.

int result = 22; 
char ch = (char)(result + base); // ch contains w 

The Caesar Cipher

With the fundamentals of manipulating characters-as-numbers out of the way, we will now implement a number of classic ciphers based off these cryptographic principles. First, we will implement the Caesar Cipher, so named after Julius Caesar who used this encryption for his own private correspondence.

In terms of the formula described above, the key we use to add and subtract to each character is constant. That is, for any message, we pick a particular value n and encrypt a message with:

E = (ch + n) mod 26 

And we decrypt a message with

D = (ch - n) mod 26 

For example, say we want to encrypt the message hello, we would pick a key n, say n = 11. Then, to encode the message, we calculate:

 'h' + 11 = 7 + 11 = 18 = 's'
 'e' + 11 = 4 + 11 = 15 = 'p'
 'l' + 11 = 11 + 11 = 22 = 'w'
 'l' + 11 = 11 + 11 = 22 = 'w'
 'o' + 11 = 14 + 11 = 25 = 'z' 

To decrypt the message, we subtract the key rather than add it:

 's' - 11 = 18 - 11 = 7
 'p' - 11 = 15 - 11 = 4
 'w' - 11 = 22 - 11 = 11
 'w' - 11 = 22 - 11 = 11
 'z' - 11 = 25 - 11 = 14 

Your task is to write a program called caesarcipher that encodes and decodes messages using the Caesar cipher as described above. Because there are only 26 letters in the English alphabet, rather than shifting according to a user-defined value, we can simply show the user the result of applying all 26 possible shifts!

Here are some example executions of the program you should create.

Example 1:

 $ ./ caesarcipher
 This program encrypts and decrypts messages using the Caesar Cipher.
 Type e to encode or d to decode a message: e
 Enter the string to encode: helloworld
 n = 0: helloworld
 n = 1: ifmmpxpsme
 n = 2: jgnnqyqtnf
 n = 3: khoorzruog
 n = 4: lippsasvph
 n = 5: mjqqtbtwqi
 n = 6: nkrrucuxrj
 n = 7: olssvdvysk
 n = 8: pmttwewztl
 n = 9: qnuuxfxaum
 n = 10: rovvygybvn
 n = 11: spwwzhzcwo
 n = 12: tqxxaiadxp
 n = 13: uryybjbeyq
 n = 14: vszzckcfzr
 n = 15: wtaadldgas
 n = 16: xubbemehbt
 n = 17: yvccfnficu
 n = 18: zwddgogjdv
 n = 19: axeehphkew
 n = 20: byffiqilfx
 n = 21: czggjrjmgy
 n = 22: dahhksknhz
 n = 23: ebiiltloia
 n = 24: fcjjmumpjb
 n = 25: gdkknvnqkc

Example 2:

 $ ./ caesarcipher
 This program encrypts and decrypts messages using the Caesar Cipher.
 Type e to encode or d to decode a message: d
 Enter the string to decode: dahhksknhz
 n = 0: dahhksknhz
 n = 1: czggjrjmgy
 n = 2: byffiqilfx
 n = 3: axeehphkew
 n = 4: zwddgogjdv
 n = 5: yvccfnficu
 n = 6: xubbemehbt
 n = 7: wtaadldgas
 n = 8: vszzckcfzr
 n = 9: uryybjbeyq
 n = 10: tqxxaiadxp
 n = 11: spwwzhzcwo
 n = 12: rovvygybvn
 n = 13: qnuuxfxaum
 n = 14: pmttwewztl
 n = 15: olssvdvysk
 n = 16: nkrrucuxrj
 n = 17: mjqqtbtwqi
 n = 18: lippsasvph
 n = 19: khoorzruog
 n = 20: jgnnqyqtnf
 n = 21: ifmmpxpsme
 n = 22: helloworld
 n = 23: gdkknvnqkc
 n = 24: fcjjmumpjb
 n = 25: ebiiltloia

Example 3:

$ ./ caesarcipher
 This program encrypts and decrypts messages using the Caesar Cipher.
 Type e to encode or d to decode a message: b
 Valid options are 'e' to encode or 'd' to decode" 

Note how the program operates:

  1. The caesarcipher program first describes what it does.
  2. It prompts the user, asking whether the user wants to encode or decode a message.
  3. If the user does not respond with either encode or decode, the program displays the valid options to the user (as shown above) and then exits.
  4. Otherwise, the program prompts the user for the string to either encode or decode.
  5. Finally, the program displays all 26 possible ways of encoding or decoding that string.

Your program must follow this format exactly and produce identical output to the sample executions above.

We will impose two limitations on user input to simplify your implementation:

  1. Your program should accept inputs with a maximum length of 100 characters. If the user provides a longer input, truncate the message to the first 100 characters.

  2. Your program should limit the user to inputs containing lowercase characters only (no uppercase letters, numbers, punctuation, or whitespace). If the user input contains any disallowed characters your program must print an error and ask the user to try again with a valid input.

Grading

Your assignment will be graded on a scale of 0–100 points based on the following criteria:

  • The program includes a working Makefile with correctly-defined targets all, clean, and caesarcipher (15 points)
  • The program requests the operation mode (encode vs. decode) and accepts the appropriate string input (20 points)
    • the program reads input until it receives a newline character or reaches an input limit.
    • if the user does not enter the appropriate response, the program gives an error message and terminates.
    • the program validates the input and reports an error if unacceptable characters are included.
  • The program displays the the table for all possible encryption/decription keys (n) (25 points)
    • The program correctly encodes each character.
    • The program correctly decodes each character.
  • The implementation is organized into at least three sensible functions (other than main) that take and use parameters. (20 points)
    • Usually it is easy to describe what a function does if it is sensible. If you can’t come up with a short explanation of what a function does it probably isn’t sensible.
    • Unnecessary parameters added to functions will not count toward the requirement of three functions accepting parameters, although you are welcome to add functions to your implementation that do not accept parameters if they help to organize your code.
  • The tests.txt file describes a reasonably comprehensive set of tests for the implementation (20 points)
    • Test encryption and decryption of valid messages of moderate length
    • Test very short and very long inputs
    • Test invalid selections for all user prompts
    • Any other edge cases you can think of
    • Use the tests.txt format from our previous assignment.

Code Quality

There will be potential deductions for any code quality violations. For now, there are six requirements for quality code:

  1. Indent the body of all loops and conditionals one level deeper than the code just outside the loop. Be consistent with your indentation style.
  2. Use curly braces for every loop or conditional body, regardless of the number of lines.
  3. Include comments that explain what your code does. There should be at least one comment by the start of each loop and ifelse block that explains what the purpose of that code is. The comment should not simply restate the code; it should add information for a human who reads the code and is confused.
  4. Give variables descriptive names that help a reader understand what they will be used for.
  5. Do not read from or display the value of any variable without first initializing it.
  6. Your code must compile with no errors and no warnings.