which had been shredded and reassembled as a random jumble of strips like so:
de| | f|Cl|nf|ed|au| i|ti| |ma|ha|or|nn|ou| S|on|nd|on ry| |is|th|is| b|eo|as| | |f |wh| o|ic| t|, | |he|h ab| |la|pr|od|ge|ob| m|an| |s |is|el|ti|ng|il|d |ua|c he| |ea|of|ho| m| t|et|ha| | t|od|ds|e |ki| c|t |ng|br wo|m,|to|yo|hi|ve|u | t|ob| |pr|d |s |us| s|ul|le|ol|e t|ca| t|wi| M|d |th|"A|ma|l |he| p|at|ap|it|he|ti|le|er ry|d |un|Th|" |io|eo|n,|is| |bl|f |pu|Co|ic| o|he|at|mm hi| | |in| | | t| | | | |ye| |ar| |s | | |.
You could brute force this and generate all 19! combinations, but this is not the AI way.
The approach I took was to greedily combine strips from left to right until all strips have been used. That is, choose an initial starting strip then try all remaining strips, then choose the one that results in the highest score for the combined strips based on trigram frequency. Rinse and repeat until all strips used.
We then do this again, choosing a new starting strip from each of the candidates, and the result with the best score is our winner.
Sounds simple, doesn't it? However, to arrive at the correct result required some tweaking of the scoring.
In my first attempt, the first 15 or so strips would be correct but the last 4 would be jumbled... I think this was because the last line is mostly blank, which upset the scoring, which was based on the sum of the log probability score for each row (see my previous post for more info on trigram probability scoring).
Using only the first 6 out of 8 rows for scoring improved this significantly, however the correct answer was actually only the third highest scoring result. The other two were almost correct but started with a column of spaces. Tweaking the scoring code to punish rows with a leading space (assuming the text is left justified) brought the correct answer to the top of the list. Hooray!
This was quite a challenge, especially as it was only my third ever Haskell program.
If you're interested in seeing how I tackled this problem in a functional language like Haskell the source code is hosted on ideone. Any feedback appreciate!
Finally, here is the output:
Shredded Message Solver (ai-class) shredded message: de| | f|Cl|nf|ed|au| i|ti| |ma|ha|or|nn|ou| S|on|nd|on ry| |is|th|is| b|eo|as| | |f |wh| o|ic| t|, | |he|h ab| |la|pr|od|ge|ob| m|an| |s |is|el|ti|ng|il|d |ua|c he| |ea|of|ho| m| t|et|ha| | t|od|ds|e |ki| c|t |ng|br wo|m,|to|yo|hi|ve|u | t|ob| |pr|d |s |us| s|ul|le|ol|e t|ca| t|wi| M|d |th|"A|ma|l |he| p|at|ap|it|he|ti|le|er ry|d |un|Th|" |io|eo|n,|is| |bl|f |pu|Co|ic| o|he|at|mm hi| | |in| | | t| | | | |ye| |ar| |s | | |. Top 3 results... score: -1619.5501131298224 Claude Shannon founded information theory, which is the basis of probabilistic language models and of the code breaking methods that you would use to solve this problem, with the paper titled "A Mathematical Theory of Communication," published in this year. score: -1631.76535223795 nnon founded informationClaude Sha ich is the basis of theory, wh tic language models and probabilis e breaking methods that of the cod use to solve this probleyou would m, aper titled "A Mathematiwith the pl ca Communication," publisheTheory of d ar. in this ye score: -1632.7894584577352 nded informationClaude Shannon fou he basis of theory, which is t uage models and probabilistic lang ng methods that of the code breaki olve this probleyou would use to s m, led "A Mathematiwith the paper titl ca ation," publisheTheory of Communic d in this year.