Labels

3G (1) 8600GT (1) AI (4) amazon (1) API (1) apple (3) apple mail (1) atlassian (1) audio (1) bambo (1) Bamboo (1) bloat (1) boost (1) bugbear (1) C++ (5) calling conventions (1) cdecl (1) chromecast (1) CI (1) compiler (1) continuous integration (1) coursera (1) custom domain (1) debugging (1) deltanine (1) diagnosis (1) diy (5) DLL (1) dns (1) don't be evil (1) ec2 (1) education (1) electronics (1) express checkout (1) fail (6) fink (1) firewire (1) free hosting (1) GAE (1) google (1) Google App Engine (4) H170 (1) hackerx (1) hackintosh (1) Haskell (3) homebrew (2) i1394 (1) icloud (2) iOS 9 (1) ipad2 (2) jobhunting (2) lag (1) letsencrypt (2) libjpeg (1) linux (1) mac (2) mbcs (1) mechanic (1) memory (1) MFC (3) Microsoft (1) migration (1) ML (1) mobile (1) movi (1) MSBuild (1) music (1) naked domain (1) NLP (2) o2 sensor (1) obd (1) Optiplex960 (1) osx (1) outlook express (1) payments (1) paypal (1) photos (2) PIL (1) Project Euler (1) projectmix (1) python (2) raspberrypi (3) recruitment (1) renwal (1) skylake (1) soundcloud (1) ssl (2) stdcall (1) stripe (1) subaru (2) supermemo (1) supermemo anki java (1) sync (2) Telstra (1) tests (1) thunderbird (1) udacity (1) unicode (1) Uniform Cost Search (1) university (1) upgrade (2) vodafail (1) vodafone (1) VS2010 (1) vs2013 (1) VS6.0 (1) weather (1) win (1) Win32 (1) Z170 (1)

Saturday, 28 January 2012

Decoding Shredded Messages in Haskell using AI techniques

Part 2 of my Introduction to Artifical Intelligence programming assignment was to decode a message,
which had been shredded and reassembled as a random jumble of strips like so:


de|  | f|Cl|nf|ed|au| i|ti|  |ma|ha|or|nn|ou| S|on|nd|on
ry|  |is|th|is| b|eo|as|  |  |f |wh| o|ic| t|, |  |he|h 
ab|  |la|pr|od|ge|ob| m|an|  |s |is|el|ti|ng|il|d |ua|c 
he|  |ea|of|ho| m| t|et|ha|  | t|od|ds|e |ki| c|t |ng|br
wo|m,|to|yo|hi|ve|u | t|ob|  |pr|d |s |us| s|ul|le|ol|e 
 t|ca| t|wi| M|d |th|"A|ma|l |he| p|at|ap|it|he|ti|le|er
ry|d |un|Th|" |io|eo|n,|is|  |bl|f |pu|Co|ic| o|he|at|mm
hi|  |  |in|  |  | t|  |  |  |  |ye|  |ar|  |s |  |  |. 


You could brute force this and generate all 19! combinations, but this is not the AI way.

The approach I took was to greedily combine strips from left to right until all strips have been used. That is, choose an initial starting strip then try all remaining strips, then choose the one that results in the highest score for the combined strips based on trigram frequency. Rinse and repeat until all strips used.

We then do this again, choosing a new starting strip from each of the candidates, and the result with the best score is our winner.

Sounds simple, doesn't it? However, to arrive at the correct result required some tweaking of the scoring.
In my first attempt, the first 15 or so strips would be correct but the last 4 would be jumbled... I think this was because the last line is mostly blank, which upset the scoring, which was based on the sum of the log probability score for each row (see my previous post for more info on trigram probability scoring).

Using only the first 6 out of 8 rows for scoring improved this significantly, however the correct answer was actually only the third highest scoring result. The other two were almost correct but started with a column of spaces. Tweaking the scoring code to punish rows with a leading space (assuming the text is left justified) brought the correct answer to the top of the list. Hooray!

This was quite a challenge, especially as it was only my third ever Haskell program.

If you're interested in seeing how I tackled this problem in a functional language like Haskell the source code is hosted on ideone. Any feedback appreciate!

Finally, here is the output:



Shredded Message Solver (ai-class)

shredded message:
de|  | f|Cl|nf|ed|au| i|ti|  |ma|ha|or|nn|ou| S|on|nd|on
ry|  |is|th|is| b|eo|as|  |  |f |wh| o|ic| t|, |  |he|h 
ab|  |la|pr|od|ge|ob| m|an|  |s |is|el|ti|ng|il|d |ua|c 
he|  |ea|of|ho| m| t|et|ha|  | t|od|ds|e |ki| c|t |ng|br
wo|m,|to|yo|hi|ve|u | t|ob|  |pr|d |s |us| s|ul|le|ol|e 
 t|ca| t|wi| M|d |th|"A|ma|l |he| p|at|ap|it|he|ti|le|er
ry|d |un|Th|" |io|eo|n,|is|  |bl|f |pu|Co|ic| o|he|at|mm
hi|  |  |in|  |  | t|  |  |  |  |ye|  |ar|  |s |  |  |. 

Top 3 results...
score: -1619.5501131298224

Claude Shannon founded information    
theory, which is the basis of         
probabilistic language models and     
of the code breaking methods that     
you would use to solve this problem,  
with the paper titled "A Mathematical 
Theory of Communication," published   
in this year.                         

score: -1631.76535223795

nnon founded informationClaude Sha    
ich is the basis of     theory, wh    
tic language models and probabilis    
e breaking methods that of the cod    
use to solve this probleyou would   m,
aper titled "A Mathematiwith the pl ca
Communication," publisheTheory of   d 
ar.                     in this ye    

score: -1632.7894584577352

nded informationClaude Shannon fou    
he basis of     theory, which is t    
uage models and probabilistic lang    
ng methods that of the code breaki    
olve this probleyou would use to s  m,
led "A Mathematiwith the paper titl ca
ation," publisheTheory of Communic  d 
                in this year.     

No comments:

Post a Comment