A long time ago a good friend of mine told me this puzzle. Letâs say there is a monkey hitting a typewriter at random. And letâs assume he is only hitting letters and that the chance for each letter to get hit is equal. Do you have to wait on average longer, the same or shorter before the monkey will have typed âabracadabraâ or âabcdefghijkâ (i.e. a string with an equal amount of characters)? Donât start reading the next paragraph if you want to solve the puzzle, since I am going to give away the answer.
Most people tend to say, well, it must be âabracadabraâ that on average shows up first. Abracadabraâs can overlap, so it is just more likely that you have that sequence and therefore you have to wait less long on average before it appears. Sounds reasonable enough, but is of course wrong.
A smaller group of people tends to say that it must be equally long. If you randomly select a character in the stream of characters coming from the monkey and ask yourself, how likely is it that at this point âabracadabraâ or âabcdefghijkâ starts, youâll see that these chances must be equal, that is, once the monkey starts typing the first letter of the sequence, he has to type exactly one letter as the next and the chance of him doing that is 1 in 26 and this is independent on what he needs to type. Again, sounds reasonable but is also wrong (or more to the point, this is correct, but that was not the question).
That pretty much leaves only the third option, i.e. abracadabra takes longer. I have two explanations here. The first one combines the previous two answers. Yes, abracadabraâs can overlap and yes the chance for abracadabra to start at a certain point, is the same as for a-k. So if we have a lengthy string of characters typed by our monkey, we expect that the number of occurrences for abracadabra to be the same as for a-k. But since abracadabraâs can overlap, they take up less space so to say (i.e. a larger portion of this string is made up of characters that are not part of abracadabra then not part of a-k). Therefore the average distance between groups of characters that are part of abracadabra must be larger than the average distance between groups of characters being part of a-k. Since when we are starting to type we are not in the middle of a group of abracadabra characters, it will take longer before we encounter some group first.
Not everybody is convinced by this explanation. Another way of looking at it, is to look what happens when the monkey types and only needs the last character of any of the targets. In the case where he is trying to type abracadabra, there are two options, either he types the a, or he doesnât. If he doesnât, the monkey will have start all over again. In the case of a-k, the monkey as three options: he types a âkâ and weâre done, he types b-j and we have to start over, or he types an âaâ and the monkey has blown this chance of typing a-k, but we already have the first letter of the next try, something that doesnât work with abracadabra.
If youâre not convinced still, I wrote a python program to simulate this. I am not using abracadabra, but ab vs aa. And my monkey doesnât type all letters, but only a,b,c,d,e and f. Still, this shouldnât make a difference. Except for one thing: on average you have to wait 34 characters before you see âabâ and around 40 before you see âaaâ . However, if you let the monkey type and you count how often âabâ occurs before âaaâ, youâll see that this is equal. Confused? Well, the first chance of either âabâ or âaaâ is just after the first âaâ; if the monkey types an âaâ or a âbâ weâre done. However, if the monkey types an âaâ, âaaâ wins, but âabâ has good chance of being just one after. If âabâ wins, âaaâ has no such chance.