Translated to Lua from chapter 14 of Invent Your Own Computer Games with Python by Al Sweigart, licensed under Creative Commons Attribution-Noncommercial-Share Alike 3.0. Thanks Al! :)
The program in this tutorial will convert normal English into a secret code, and also convert secret codes back into regular English again. Only someone who is knowledgeable about secret codes will be able to understand our secret messages.
Because this program manipulates text in order to convert it into secret messages, we will learn several new functions and methods that come with Lua for manipulating strings. We will also learn how programs can do math with text strings just as it can with numbers.
The science of writing secret codes is called cryptography. Cryptography has been used for thousands of years to send secret messages that only the recipient could understand, even if someone captured the messenger and read the coded message. A secret code system is called a cipher. There are thousands of different ciphers that have been used, each using different techniques to keep the messages a secret.
In cryptography, we call the message that we want to be secret the plaintext. The plaintext could look something like this:
Hello there! The keys to the house are hidden under the reddish flower pot.
When we convert the plaintext into the encoded message, we call this encrypting the plaintext. The plaintext is encrypted into the ciphertext. The ciphertext looks like random letters (also called garbage data), and we cannot understand what the original plaintext was by just looking at the ciphertext. Here is an example of some ciphertext:
Ckkz fkx kj becqnejc kqp pdeo oaynap iaoowca!
But if we know about the cipher used to encrypt the message, we can decrypt the ciphertext back to the plaintext. (Decryption is the opposite of encryption.)
Many ciphers also use keys. Keys are secret values that let you decrypt ciphertext that was encrypted using a specific cipher. Think of the cipher as being like a door lock. Although all the door locks of the same type are built the same, but a particular lock will only unlock if you have the key made for that lock.
Figure 1: Shifting over letters by three spaces. Here, B becomes E.
When we encrypt a message using a cipher, we will choose the key that is used to encrypt and decrypt this message. The key for our Caesar Cipher will be a number from 1 to 26. Unless you know the key (that is, know the number), you will not be able to decrypt the encrypted message.
The Caesar Cipher was one of the earliest ciphers ever invented. In this cipher, you encrypt a message by taking each letter in the message (in cryptography, these letters are called symbols because they can be letters, numbers, or any other sign) and replacing it with a "shifted" letter. If you shift the letter A by one space, you get the letter B. If you shift the letter A by two spaces, you get the letter C. Figure 1 is a picture of some letters shifted over by 3 spaces.
To get each shifted letter, draw out a row of boxes with each letter of the alphabet. Then draw a second row of boxes under it, but start a certain number of spaces over. When you get to the leftover letters at the end, wrap around back to the start of the boxes. Here is an example with the letters shifted by three spaces:
Figure 2: The entire alphabet shifted by three spaces.
The number of spaces we shift is the key in the Caesar Cipher. The example above shows the letter translations for the key 3.
Using a key of 3, if we encrypt the plaintext "Howdy", then the "H" becomes "K". The letter "o" becomes "r". The letter "w" becomes "z". The letter "d" becomes "g". And the letter "y" becomes "b". The ciphertext of "Hello" with key 3 becomes "Krzgb".
We will keep any non-letter characters the same. In order to decrypt "Krzgb" with the key 3, we just go from the bottom boxes back to the top. The letter "K" becomes "H", the letter "r" becomes "o", the letter "z" becomes "w", the letter "g" becomes "d", and the letter "b" becomes "y" to form "Howdy".
You can find out more about the Caesar Cipher from Wikipedia at http://en.wikipedia.org/wiki/Caesar_cipher
How do we implement this shifting of the letters in our program? We can do this by representing each letter as a number (called an ordinal), and then adding or subtracting from this number to form a new number (and a new letter). ASCII (pronounced "ask-ee" and stands for American Standard Code for Information Interchange) is a code that connects each character to a number between 32 and 127. The numbers less than 32 refer to "unprintable" characters, so we will not be using them.
The capital letters "A" through "Z" have the ASCII numbers 65 through 90. The lowercase letters "a" through "z" have the ASCII numbers 97 through 122. The numeric digits "0" through "9" have the ASCII numbers 48 through 57.
32 | (space) | 48 | 0 | 64 | @ | 80 | P | 96 | ` | 112 | p |
33 | ! | 49 | 1 | 65 | A | 81 | Q | 97 | a | 113 | q |
34 | " | 50 | 2 | 66 | B | 82 | R | 98 | b | 114 | r |
35 | # | 51 | 3 | 67 | C | 83 | S | 99 | c | 115 | s |
36 | $ | 52 | 4 | 68 | D | 84 | T | 100 | d | 116 | t |
37 | % | 53 | 5 | 69 | E | 85 | U | 101 | e | 117 | u |
38 | & | 54 | 6 | 70 | F | 86 | V | 102 | f | 118 | v |
39 | ' | 55 | 7 | 71 | G | 87 | W | 103 | g | 119 | w |
40 | ( | 56 | 8 | 72 | H | 88 | X | 104 | h | 120 | x |
41 | ) | 57 | 9 | 73 | I | 89 | Y | 105 | i | 121 | y |
42 | * | 58 | : | 74 | J | 90 | Z | 106 | j | 122 | z |
43 | + | 59 | ; | 75 | K | 91 | [ | 107 | k | 123 | { |
44 | , | 60 | < | 76 | L | 92 | \ | 108 | l | 124 | | |
45 | - | 61 | = | 77 | M | 93 | ] | 109 | m | 125 | } |
46 | . | 62 | > | 78 | N | 94 | ^ | 110 | n | 126 | ~ |
47 | / | 63 | ? | 79 | O | 95 | _ | 111 | o |
So if we wanted to shift "A" by three spaces, we first convert it to a number (65). Then we add 3 to 65, to get 68. Then we convert the number 68 back to a letter ("D"). We will use the string.char() and string.byte() functions to convert between letters and numbers.
For example, the letter "A" is represented by the number 65. The letter "m" is represented by the number 109. A table of all the ASCII characters from 32 to 12 is in the table above.
The string.char() function (short for "character") takes an ASCII number for the parameter and returns the single-character string. The string.byte() function takes a single-character string for the parameter, and returns the number ASCII value for that character. Try typing the following into the interactive prompt:
= string.char(65) A = string.byte('A') 65 = string.char(65+8) I = string.char(52) 4 = string.char(string.byte('F')) F = string.byte(string.char(68)) 68On the third line, string.char(65+8) evaluates to string.char(73). If you look at the ASCII table, you can see that 73 is the ordinal for the capital letter "I". On the fifth line, string.char(string.byte('F')) evaluates to string.char(70) which evaluates to 'F'. Feeding the result of string.byte() to string.char() will evaluate to the same as the original argument. The same goes for feeding the result of string.char() to string.byte(), as shown by the sixth line.
Using string.char() and string.byte() will come in handy for our Caesar Cipher program.
Here is a sample run of the Caesar Cipher program, encrypting a message:
Do you wish to encrypt or decrypt a message? encrypt Enter your message: The sky above the port was the color of television, tuned to a dead channel. Enter the key number (1-26) 13 Your translated text is: Gur fxl nobir gur cbeg jnf gur pbybe bs gryrivfvba, gharq gb n qrnq punaary.
Now we will run the program and decrypt the text that we just encrypted.
Do you wish to encrypt or decrypt a message? decrypt Enter your message: Gur fxl nobir gur cbeg jnf gur pbybe bs gryrivfvba, gharq gb n qrnq punaary. Enter the key number (1-26) 13 Your translated text is: The sky above the port was the color of television, tuned to a dead channel.
On this run we will try to decrypt the text that was encrypted, but we will use the wrong key. Remember that if you do not know the correct key, the decrypted text will just be garbage data.
Do you wish to encrypt or decrypt a message? decrypt Enter your message: Gur fxl nobir gur cbeg jnf gur pbybe bs gryrivfvba, gharq gb n qrnq punaary. Enter the key number (1-26) 15 Your translated text is: Rfc qiw yzmtc rfc nmpr uyq rfc amjmp md rcjctgqgml, rslcb rm y bcyb afyllcj.
Here is the source code for the Caesar Cipher program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | -- Caesar Cipher MAX_KEY_SIZE = 26 function getMode() while true do print('Do you wish to encrypt or decrypt a message?') mode = string.lower(io.read()) for index, value in ipairs({'encrypt', 'e', 'decrypt', 'd'}) do if mode == value then return mode end end print('Enter either "encrypt" or "e" or "decrypt" or "d".') end end function getMessage() print('Enter your message:') return io.read() end function getKey() key = 0 while true do print(string.format('Enter the key number (1-%s)', MAX_KEY_SIZE)) key = io.read('*number') if (key >= 1 and key <= MAX_KEY_SIZE) then return key end end end function getTranslatedMessage(mode, message, key) if mode:sub(1, 1) == 'd' then key = -key end translated = '' for i = 1, #message do local symbol = message:sub(i,i) if symbol == symbol:match('%a') then num = string.byte(symbol) num = num + key if symbol == symbol:upper() then if num > string.byte('Z') then num = num - 26 elseif num < string.byte('A') then num = num + 26 end elseif symbol == symbol:lower() then if num > string.byte('z') then num = num - 26 elseif num < string.byte('a') then num = num + 26 end end translated = translated .. string.char(num) else translated = translated .. symbol end end return translated end mode = getMode() message = getMessage() key = getKey() print('Your translated text is:') print(getTranslatedMessage(mode, message, key)) |
Let's look at how each line works.
1 2 3 | -- Caesar Cipher MAX_KEY_SIZE = 26 |
The first line is simply a comment. The Caesar Cipher is one cipher of a type of ciphers called simple substitution ciphers. Simple substitution ciphers are ciphers that replace one symbol in the plaintext with one (and only one) symbol in the ciphertext. So if a "G" was substituted with "Z" in the cipher, every single "G" in the plaintext would be replaced with (and only with) a "Z".
MAX_KEY_SIZE is a variable that stores the number 26 in it. MAX_KEY_SIZE reminds us that in this program, the key used in our cipher should be between 1 and 26.
5 6 7 8 9 10 11 12 13 14 15 16 | function getMode() while true do print('Do you wish to encrypt or decrypt a message?') mode = string.lower(io.read()) for index, value in ipairs({'encrypt', 'e', 'decrypt', 'd'}) do if mode == value then return mode end end print('Enter either "encrypt" or "e" or "decrypt" or "d".') end end |
The getMode() function will let the user type in if they want to encrypt or decrypt the message. The return value of io.read() (which then has the string.lower() function called on it, which returns the lowercase version of the string) is stored in mode. The if statement's condition checks if the string stored in mode exists in the table {'encrypt', 'e', 'decrypt', 'd'}.
This function will return mode as long as mode is equal to 'encrypt', 'e', 'decrypt', or 'd'.
18 19 20 21 | function getMessage() print('Enter your message:') return io.read() end |
The getMessage() function simply gets the message to encrypt or decrypt from the user and uses this string as its return value.
23 24 25 26 27 28 29 30 31 31 | function getKey() key = 0 while true do print(string.format('Enter the key number (1-%s)', MAX_KEY_SIZE)) key = io.read('*number') if (key >= 1 and key <= MAX_KEY_SIZE) then return key end end end |
The getKey() function lets the player type in key they will use to encrypt or decrypt the message. The while loop ensures that the function only returns a valid key. A valid key here is one that is between the number values 1 and 26 (remember that MAX_KEY_SIZE will only have the value 26 because it is constant). It then returns this key. Remember that on line 27 that key was set to the number version of what the user typed in, and so getKey() returns an number.
34 35 36 37 38 | function getTranslatedMessage(mode, message, key) if mode:sub(1, 1) == 'd' then key = -key end translated = '' |
getTranslatedMessage() is the function that does the encrypting and decrypting in our program. It has three parameters. mode sets the function to encryption mode or decryption mode. message is the plaintext (or ciphertext) to be encrypted (or decrypted). key is the key that is used in this cipher.
The first line in the getTranslatedMessage() function determines if we are in encryption mode or decryption mode. If the first letter in the mode variable is the string 'd', then we are in decryption mode. The only difference between the two modes is that in decryption mode, the key is set to the negative version of itself. If key was the number 22, then in decryption mode we set it to -22. The reason for this will be explained later.
translated is the string that will hold the end result: either the ciphertext (if we are encrypting) or the plaintext (if we are decrypting). We will only be concatenating strings to this variable, so we first store the blank string in translated. (A variable must be defined with some string value first before a string can be concatenated to it.)
The match() string method looks for the first match of pattern in the string. If it finds one, it returns the captures from the pattern. The match method with the pattern '%a' will return nil if the string isn't an uppercase or lowercase letter from A to Z. Try typing the following into the interactive shell:
= (' '):match('%a') nil = ('a'):match('%a') A = ('1'):match('%a') 1 = (''):match('%a') nil
As you can see, (' '):match('%a') will return nil because a space is a non-letter character. ('a'):match('%a') returns 'a' because it is a letter. ('1'):match('%a') returns nil because both '1' is a non-letter character. And (''):match('%a') is nil it's blank.
We will use the match() method in our program in the next few lines.
Line 40's for loop iterates over each letter (remember in cryptography they are called symbols) in the message string. message:sub(i,i) returns the loop's current symbol.
The reason we have the if statement on line 42 is because we will only encrypt/decrypt letters in the message. Numbers, signs, punctuation marks, and everything else will stay in their untranslated form. The num variable will hold the number ordinal value of the letter stored in symbol. Line 44 then "shifts" the value in num by the value in key.
The upper() and lower() string methods (which are on line 46 and 52) will return the string it is called on in uppercase or lowercase. The symbols are compared to themselves with these methods called on them, to see if they are in either uppercase or lowercase. Try typing the following into the interactive shell:
= ('HELLO'):upper() == 'HELLO' true = ('HELLO'):lower() == 'HELLO' false = ('Hello'):lower() == 'hello' true = ('42'):upper() == '42' true
The process of encrypting (or decrypting) each letter is fairly simple. We want to apply the same Lua code to every letter character in the string, which is what the next several lines of code do.
46 47 48 49 50 51 | if symbol == symbol:upper() then if num > string.byte('Z') then num = num - 26 elseif num < string.byte('A') then num = num + 26 end |
This code checks if the symbol is an uppercase letter. If so, there are two special cases we need to worry about. What if symbol was 'Z' and key was 4? If that were the case, the value of num here would be the character '^' (The ordinal of '^' is 94). But ^ isn't a letter at all. We wanted the ciphertext to "wrap around" to the beginning of the alphabet.
The way we can do this is to check if key has a value larger than the largest possible letter's ASCII value (which is a capital "Z"). If so, then we want to subtract 26 (because there are 26 letters in total) from num. After doing this, the value of num is 68, which is the ASCII value for 'D'.
52 53 54 55 56 57 58 | elseif symbol == symbol:lower() then if num > string.byte('z') then num = num - 26 elseif num < string.byte('a') then num = num + 26 end end |
If the symbol is a lowercase letter, the program runs code that is very similar to lines 36 through 40. The only difference is that we use string.byte('z') and string.byte('a') instead of string.byte('Z') and string.byte('A').
If we were in decrypting mode, then key would be negative. Then we would have the special case where num = num - 26 might be less than the smallest possible value (which is string.byte('A'), that is, 65). If this is the case, we want to add 26 to num to have it "wrap around".
60 61 62 | translated = translated .. string.char(num) else translated = translated .. symbol |
The translated string will be appended with the encrypted/decrypted character. If the symbol was not an uppercase or lowercase letter, then the else-block on line 61 would have executed instead. All the code in the else-block does is append the original, untranslated symbol to the translated string. This means that spaces, numbers, punctuation marks, and other characters will not be encrypted or decrypted.
66 | return translated |
The last line in the getTranslatedMessage() function returns the translated string.
69 70 71 72 73 74 | mode = getMode() message = getMessage() key = getKey() print('Your translated text is:') print(getTranslatedMessage(mode, message, key)) |
This is the main part of our program. We call each of the three functions we have defined above in turn to get the mode, message, and key that the user wants to use. We then pass these three values as arguments to getTranslatedMessage(), whose return value (the translated string) is printed to the user.
That's the entire Caesar Cipher. However, while this cipher may fool some people who don't understand cryptography, it won't keep a message secret from someone who knows cryptanalysis. While cryptography is the science of making codes, cryptanalysis is the science of breaking codes.
Do you wish to encrypt or decrypt a message? encrypt Enter your message: Doubts may not be pleasant, but certainty is absurd. Enter the key number (1-26) 8 Your translated text is: Lwcjba uig vwb jm xtmiaivb, jcb kmzbiqvbg qa ijaczl.
The whole point of cryptography is that so if someone else gets their hands on the encrypted message, they cannot figure out the original unencrypted message from it. Let's pretend we are the code breaker and all we have is the encrypted text:
Lwcjba uig vwb jm xtmiaivb, jcb kmzbiqvbg qa ijaczl.
One method of cryptanalysis is called brute force. Brute force is the technique of trying every single possible key. If the cryptanalyst knows the cipher that the message uses (or at least guesses it), they can just go through every possible key. Because there are only 26 possible keys, it would be easy for a cryptanalyst to write a program than prints the decrypted ciphertext of every possible key and see if any of the outputs make sense. Let's add a brute force feature to our program.
First, change lines 7, 9, and 14 (which are in the getMode() function) to look like the following (the changes are in bold):
5 6 7 8 9 10 11 12 13 14 15 16 | function getMode() while true do print('Do you wish to encrypt or decrypt or brute force a message?') mode = string.lower(io.read()) for index, value in ipairs({'encrypt', 'e', 'decrypt', 'd', 'brute', 'b'}) do if mode == value then return mode end end print('Enter either "encrypt" or "e" or "decrypt" or "d" or "brute" or "b".') end end |
This will let us select "brute force" as a mode for our program. Then modify and add the following changes to the main part of the program:
69 71 72 73 74 75 76 77 78 79 80 81 82 83 | mode = getMode() message = getMessage() if mode:sub(1, 1) ~= 'b' then key = getKey() end print('Your translated text is:') if mode:sub(1, 1) ~= 'b' then print(getTranslatedMessage(mode, message, key)) else for key = 1, MAX_KEY_SIZE do print(key, getTranslatedMessage('decrypt', message, key)) end end |
These changes make our program ask the user for a key if they are not in "brute force" mode. If they are not in "brute force" mode, then the original getTranslatedMessage() call is made and the translated string is printed.
However, otherwise we are in "brute force" mode, and we run a getTranslatedMessage() loop that iterates from 1 all the way up to MAX_KEY_SIZE (which is 26). This program will print out every possible translation of the message (including the key number used in the translation). Here is a sample run of this modified program:
Do you wish to encrypt or decrypt or brute force a message? brute Enter your message: Lwcjba uig vwb jm xtmiaivb, jcb kmzbiqvbg qa ijaczl. Your translated text is: 1 Kvbiaz thf uva il wslhzhua, iba jlyahpuaf pz hizbyk. 2 Juahzy sge tuz hk vrkgygtz, haz ikxzgotze oy ghyaxj. 3 Itzgyx rfd sty gj uqjfxfsy, gzy hjwyfnsyd nx fgxzwi. 4 Hsyfxw qec rsx fi tpiewerx, fyx givxemrxc mw efwyvh. 5 Grxewv pdb qrw eh sohdvdqw, exw fhuwdlqwb lv devxug. 6 Fqwdvu oca pqv dg rngcucpv, dwv egtvckpva ku cduwtf. 7 Epvcut nbz opu cf qmfbtbou, cvu dfsubjouz jt bctvse. 8 Doubts may not be pleasant, but certainty is absurd. 9 Cntasr lzx mns ad okdzrzms, ats bdqszhmsx hr zartqc. 10 Bmszrq kyw lmr zc njcyqylr, zsr acpryglrw gq yzqspb. 11 Alryqp jxv klq yb mibxpxkq, yrq zboqxfkqv fp xyproa. 12 Zkqxpo iwu jkp xa lhawowjp, xqp yanpwejpu eo wxoqnz. 13 Yjpwon hvt ijo wz kgzvnvio, wpo xzmovdiot dn vwnpmy. 14 Xiovnm gus hin vy jfyumuhn, von wylnuchns cm uvmolx. 15 Whnuml ftr ghm ux iextltgm, unm vxkmtbgmr bl tulnkw. 16 Vgmtlk esq fgl tw hdwsksfl, tml uwjlsaflq ak stkmjv. 17 Uflskj drp efk sv gcvrjrek, slk tvikrzekp zj rsjliu. 18 Tekrji cqo dej ru fbuqiqdj, rkj suhjqydjo yi qrikht. 19 Sdjqih bpn cdi qt eatphpci, qji rtgipxcin xh pqhjgs. 20 Rciphg aom bch ps dzsogobh, pih qsfhowbhm wg opgifr. 21 Qbhogf znl abg or cyrnfnag, ohg pregnvagl vf nofheq. 22 Pagnfe ymk zaf nq bxqmemzf, ngf oqdfmuzfk ue mnegdp. 23 Ozfmed xlj yze mp awpldlye, mfe npceltyej td lmdfco. 24 Nyeldc wki xyd lo zvokckxd, led mobdksxdi sc klcebn. 25 Mxdkcb vjh wxc kn yunjbjwc, kdc lnacjrwch rb jkbdam. 26 Lwcjba uig vwb jm xtmiaivb, jcb kmzbiqvbg qa ijaczl.
After looking over each row, you can see that the 8th message is not garbage, but plain English! The cryptanalyst can deduce that the original key for this encrypted text must have been 8. This brute force would have been difficult to do back in the days of Caesars and the Roman Empire, but today we have computers that can quickly go through millions or even billions of keys in a short time. You can even write a program that can recognize when it has found a message in English, so you don't have read through all the garbage text.
Computers are very good at doing mathematics. When we create a system to translate some piece of information into numbers (such as we do with text and ASCII or with space and coordinate systems), computer programs can process these numbers very quickly and efficiently.
But while our Caesar cipher program here can encrypt messages that will keep them secret from people who have to figure it out with pencil and paper, it won't keep it secret from people who know how to get computers to process information for them. (Our brute force mode proves this.) And there are other cryptographic ciphers that are so advanced that nobody knows how to decrypt the secret messages they make. (Except for the people with the key of course!)
A large part of figuring out how to write a program is figuring out how to represent the information you want to manipulate as numbers. I hope this tutorial has especially shown you how this can be done.