This page requires Javascript for the interactive bits.
The Vigenère cipher was created by Giovan Battista Bellaso in 1553 and was mis-attributed to Blaise de Vigenère. The cipher was very strong for it's time and it took nearly 300 years before reliable cracking methods were discovered. This earned it the title "le chiffrage indéchiffrable". In this post I will detail some modern statistical techniques that can be used to break the cipher. For a full history of the cipher and more approaches at breaking it I recommend grabbing a copy of The Code Book by Simon Singh.
Vigenère ciphers come in several variants. The most common version utilizes a secret key phrase that is shared between sender and receiver. To use this method first write the message down and then write the letters of the secret over each message letter. Repeat the secret as many times as necessary to cover the entire text. Here the key "PIZZA" is overlaid on top of a secret message.
Key: PIZZ AP IZZ APIZ ZA PIZZ APIZZA
Message: MEET AT THE CAFE ON MAIN STREET
Each letter of the secret key will be used to represent a specific Ceaser cipher1. A Ceaser cipher shifts the alphabet by a certain number of letters. For example, a key letter of "P" would shift the first letter by 16 places ("P" being the sixteenth letter in the alphabet). The first letter in the plaintext is "M", so this will be shifted 16 times (M -> N -> O ...) and eventually turned into a "B" (letters that wrap past Z start back at A and keep going).
This is repeated for every letter in the plaintext. Here is the encoded message:
Key: PIZZ AP IZZ APIZ ZA PIZZ APIZZA
Message: MEET AT THE CAFE ON MAIN STREET
-----------------------------------------
Cipher: BMDS AI BGD CPND NN BIHM SIZDDT
To assist with the encoding and decoding of a message a Vigenère Square is useful. This is also called the tabula recta 2 and is used for both the encryption and decryption of a message.
Reference Vigenère Square
|   | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
| B | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A |
| C | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B |
| D | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C |
| E | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D |
| F | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E |
| G | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F |
| H | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G |
| I | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H |
| J | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I |
| K | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J |
| L | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K |
| M | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L |
| N | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M |
| O | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N |
| P | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| Q | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |
| R | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |
| S | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R |
| T | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S |
| U | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T |
| V | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U |
| W | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V |
| X | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W |
| Y | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X |
| Z | Z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y |
Encoding with the Square
Encrypting a message via the square is a straightforward task. The column headers represent the current letter of the key and the row headings represent the letter of the plain-text message. By hovering over the "P" column and looking at column "M" the resulting square is the letter "B". Try this for the next several encodings and compare the result to the ciphertext above.
Decoding using the Square
The square can likewise be used to decrypt a message if the key is known. The row header is the current key letter. The cell letter in this row that matches the ciphertext letter will have a column header of the plaintext message. Based on the above cipher the first plaintext letter can be found by hovering over row "P". Next read down the row until the letter of the ciphertext ("B"). The column header will be the resulting plaintext entry of "M". Try this for a few more entries in the ciphertext.
Encryption Tool
Encrypting and decrypting a cipher encoded with the Vigenère square can be a time-consuming process. The following forms let you encrypt messages with a secret key. Whatever is placed in these two fields will be used later in the decryption example. I have chosen a key and text here that work well for the examples later. Feel free to experiment!
Cracking the Cipher
This section is a fully live demo! Modify the above secrets and messages to see this section change. You can also replace the result text above with your own ciphertext and this section will attempt to decrypt it!
Several methods exist to break the Vigenère cipher. This section will attempt to use a modified version of the Freidman test3 (or kappa test) to crack the cipher above using statistical analysis. There are three phases to this method:
- Discover the length of the encryption key
- Derive the most likely letter for each key character position
- Decode the cipher based on the discovered key
Phase 1 - Discover Encryption Key Length
There are several steps involved in this process:
- Chose a number (N) equal to the anticipated key length
- Break the ciphertext into groups every N characters to construct the columns.
- Calculate the index of coincidence (IC) for every column and average them together.
- Compare this result to the English index of coincidence (~1.73).
- Repeat for another key length until all anticipated lengths have been checked.
Breaking into Columns
The above ciphertext has been broken into the following five columns as an example.
Calculating the IC Value
The index of coincidence is calculated for each column, in this case a total of five times. These values are then averaged together for all groups. This result IC will be very close to the English IC of ~1.73 if the correct number of columns (and thus key length) was chosen.
Here is the equation for calculating the IC:
Let's break this down into two components and calculate the IC for the first grouping above:
The Numerator
The numerator states that we must calculate the frequency ( 𝑓 ), or number of occurrences, for each letter and multiply it by its own frequency minus one. All of these values will be summed together.
Here are the frequencies of each letter from the first column in the above section:
Any characters that don't appear in the text do not need to be summed. Also note that single letters disappear due to the multiply by 0.
The Denominator
The denominator takes the total number of letters N, multiplies it by itself minus 1 and divides the result by the total number of letters that compose the alphabet (26). This last division is a normalizing operation.
Result
In short, the IC is a measure of how "rough" a letter distribution is. Since letters in an alphabet are only compared with themselves during the summations it doesn't matter if the letters have been shifted (such as 'A -> D', "B -> E", ...) when calculating this value!
These two pages have more information on this value if you wish to read further:
Wikipedia - Index of Coincidence
Practical
Cryptography - Index of Coincidence
Comparing all IC Values
Here is a table of the IC values for various key lengths (column counts) based on the input above with the most likely length candidate highlighted based on the proximity to the English IC value (~1.73). The actual key value has been marked with an ">".
This method is not perfect but on longer texts it should converge on the correct answer. If the sample text you provided doesn't provide a match check the error margins out for the real value. The proximity should have a low (close) value assuming an English text was provided. Typically, an error will manifest as a multiple of the key length (such as 16 being detected when 4 was the correct key length).
Phase 2 - Derive the Key
At this point the key length has been obtained and the groups have been found. Each group is a simple Caesar cipher shifted a certain number of letters (the individual key letter value). Frequency analysis can be used to break each group and obtain the resulting key.
Basic Frequency Chart
Each group will need to be calculated independently since it is very likely each group uses a different letter in the key and thus has a different number of shifts based on the Vigenère square's output.
To continue the demonstration I have cheated and used the actual key length from the original message just in case the chosen best IC value wasn't quite correct. In a real scenario multiple key lengths might require testing.
The previous step should have resulted in a key length of . Here is the first group created by taking every letters out of the ciphertext:
This table shows the letter distribution for English text as well as the letter distribution in the group above:
Visually you may see a pattern immediately between the two charts where the Group 1 distribution could be slid "upwards" or "downwards" to make most of the bars line up similarly by shape to an English distribution. The distance of this shift is the same distance from "A" our key's first character is. For example a shift upwards of 11 to line up the charts would mean our key's value is 11 characters down the alphabet from "A" (which would be "L").
This is a simplistic way to match two graphs but there is a better way to statistically match two frequency graphs.
Chi-squared Method
The Chi-squared calculation relies upon knowledge of the normal distribution of letters in the English language to work (which we already have from the above table).
The Chi-squared formula is:
where C is the count of each letter and E is the expected count of the letter (based on our English distribution). The result of the Chi-squared formula will be smaller the more similar the distributions are. Note that these are the counts of expected letters, not the probabilities of letters.
Here is a table showing the Chi-squared distribution for the first group. Each row represents shifting the letters "back" by one each iteration with a normal Caeser cipher shift before recalculation.
Now the shift can be applied to the letter "A" to determine the letter used for this specific group. In this case the first group (and thus the first key letter) appears to be the highlighted row since this has the lowest value.
Chi-squared for all Groups
By running the Chi-squared routine on each group the encryption key is probably the following:
Note! This might not be correct, the statistical methods aren't perfect and shorter texts / key lengths tend to be more difficult. Try a longer starting message and see if the results are produced as expected!
Phase 3 - Decrypting the Text
This is the simplest step as it only involves the reverse operation used to encrypt the text. See the first section of this page on how to decrypt a message using the tabula recta. Here is the resulting plaintext after following the steps in this section.
Conclusion and Further Reading
At this point I encourage readers to play with above examples. Here's a list of some things to try:
- What happens if the key is a single character?
- What about the key "A"?
- How does using the same letter for a long key ("BBBBB") impact the analysis?
- Can the same input text be repeated a few times to increase the chances of a successful analysis?
- What happens if you run the ciphertext through another Vigenère square with the same key? Different key?
I hope you now have a basic understanding of how the Vigenère cipher works, some methods on how to attack the cipher, and can appreciate all the work done by cryptanalysts on this over the past several hundred years.
Footnotes and References
The sample text provided at the start of this demonstration was taken from the WikiSource library:
Edgar
Allan Poe - The Raven.
You can find the source code for the demos on this page here.
I used LaTeX to SVG for rendering the IC equation in this post.