Algorithm to decode substitution ciphers

A few years ago, I read The Code Book by Simon Singh. At the end of the book, Singh provides a Cipher Challenge – 10 exercises to decode.

For this exercise, I revisit the first challenge – a basic substitution cipher.

It has been years since I looked at this book or the exercise, and I don’t remember how long it took me to decode it by hand back then. So I decided to try it again. It took me around 40 minutes to decipher it.

A post shared by Mithru Vigneshwara (@mithru) on

Now, I’d like to challenge myself to write an algorithm that can decipher this faster. At this stage I don’t know how that would be, but I have a few thoughts.

First, let me briefly explain how I decoded this by hand.

I knew this was a basic substitution cypher that did not purge the spaces between words (the book said so), therefore I began by looking at common patterns of small words. JPX was quite common. I then looked up the most common words in English. So I started with the assumption that J = T, P = H, X = E. I then found a JPXT┬áin the text. It could only mean THEM or THEN. The JPXT was the start of a sentence, and it was followed by JPX. ‘THEN THE’ sounded more probably than ‘THEM THE’, so I also started assuming T = N. Similarly, I went through the other letters until I deciphered it enough to just guess the missing letters.

Now, on to writing an algorithm that does this by itself. Again, I don’t fully know how this would work, but I’m going to experiment with a genetic algorithm that knows a list of common words.

The algorithm will start by producing randomized ciphers, and attempting to decipher any text given to it.

The fitness score of the different deciphered mutations will likely be based on the occurrence of the common words. I don’t know if this will work out, but it’s a fun experiment nevertheless.

So far I’ve written a processing sketch that encodes and decodes phrases. I’ll likely build upon this.

Challenges I foresee:

1. I might need to look up a much larger list of common words.

2. The fitness scoring system should probably be smarter than what I currently have in mind.

So that’s the plan. Anything under this sentence will be updated to show results.


 

Share on Google+Share on LinkedInTweet about this on TwitterPin on PinterestShare on Facebook

Mithru

Leave a Reply

Your email address will not be published. Required fields are marked *