Labels

3G (1) 8600GT (1) AI (4) amazon (1) API (1) apple (3) apple mail (1) atlassian (1) audio (1) bambo (1) Bamboo (1) bloat (1) boost (1) bugbear (1) C++ (5) calling conventions (1) cdecl (1) chromecast (1) CI (1) compiler (1) continuous integration (1) coursera (1) custom domain (1) debugging (1) deltanine (1) diagnosis (1) diy (5) DLL (1) dns (1) don't be evil (1) ec2 (1) education (1) electronics (1) express checkout (1) fail (6) fink (1) firewire (1) free hosting (1) GAE (1) google (1) Google App Engine (4) H170 (1) hackerx (1) hackintosh (1) Haskell (3) homebrew (2) i1394 (1) icloud (2) iOS 9 (1) ipad2 (2) jobhunting (2) lag (1) letsencrypt (2) libjpeg (1) linux (1) mac (2) mbcs (1) mechanic (1) memory (1) MFC (3) Microsoft (1) migration (1) ML (1) mobile (1) movi (1) MSBuild (1) music (1) naked domain (1) NLP (2) o2 sensor (1) obd (1) Optiplex960 (1) osx (1) outlook express (1) payments (1) paypal (1) photos (2) PIL (1) Project Euler (1) projectmix (1) python (2) raspberrypi (3) recruitment (1) renwal (1) skylake (1) soundcloud (1) ssl (2) stdcall (1) stripe (1) subaru (2) supermemo (1) supermemo anki java (1) sync (2) Telstra (1) tests (1) thunderbird (1) udacity (1) unicode (1) Uniform Cost Search (1) university (1) upgrade (2) vodafail (1) vodafone (1) VS2010 (1) vs2013 (1) VS6.0 (1) weather (1) win (1) Win32 (1) Z170 (1)

Tuesday 10 January 2012

Solving Caesar Ciphers Using Haskell

I recently completed the free online Stanford Universivity Introduction to Artificial Intelligence class, taught by Peter Norvig and Sebastion Thrun. I can't highly recommend this course enough if you are interested in AI - and did i mention it's free?

Anyway, one of the optional programming assignments was to decode the following message:

"Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc."

The message is encoded using a Caesar Cipher - a very simple shift cipher, where each letter is substituted for another a fixed number of positions down the alphabet. Those of you old enough to remember Usenet may have encountered rot13, which is an example of a Caesar cipher.

The key to solving this problem is probabilistic analysis of the text. My first, rather naive approach based on intuition rather a strict application of probability theory was simply to analyse the text with the assumption that the most frequent letter in English is 'e'. Using AI jargon, we could say I used a probabilistic unigram letter model.

Anyway, I thought it was a great opportunity to practice my freshly minted Haskell skills, and to my surprise this naive approach actually worked!

Blogger does a lousy job of inlining code so I have hosted the code here on ideone.com, an awesome site that not only displays code with syntax highlighting, it can even take input, run the code and display the output. Thanks to Maxim on aiqus for improving my Haskell code.

Flush with my success, I tried a more sophisticated approach. This time I would use a trigram letter model, which are basically sequences of three adjacent letters- for example, one of the most common letter trigrams in English is 'THE". The approach was to generate all 26 possible decoding of the string, and score each one based on the trigram probabilities, choosing the one with the highest score as the most likely candidate. A few tricks that I learned in AI class were employed here - I used Laplace Smoothing to ensure that unlikely trigrams that I didn't have scores for would not invalidate my probability score due to multiplying by zero.  I also used the log of the probabilities so I could simply sum them and didn't have to deal with ridiculously small numbers.

The source code, along with the trigram input data and output can be found here on here on oneide.com.
I have copied the output below

Caesar Cypher Solver (ai-class)

encoded message:
Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc.

The top 3 candidates based on trigram letter probabilities for english language
The first conference on the topic of Artificial Intelligence was held at Dartmouth College in this year.
score: -742.9796021883617

Nby zclmn wihzylyhwy ih nby nijcw iz Ulnczcwcuf Chnyffcayhwy qum byfx un Xulngionb Wiffyay ch nbcm syul.
score: -913.8004146160043

Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc.
score: -921.9751546970537

I would appreciate any comments on my Haskell code, I'm a bit of a novice...  and any other feedback would be greatly appreciated.


          
          







3 comments:

  1. Cool.
    Paul can you make it so that you give it a page of an english novel, and the cypher message, and it does some kind of correlation between the shifted cyphers and the text you give it to work out which is english? That would impress me.

    ReplyDelete
  2. I would if I could understand what you just said...

    ReplyDelete