steady-rollin.com— Many people have encountered the "bush hid the facts" bug in Windows Notepad. Some view it as an easter-egg, some as a bug. Here's the explanation with examples.
Nov 17, 2006View in Crawl 4
On WinXP at least, Notepad.exe calls IsTextUnicode(). Microsoft's man page <a class="user" href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp</a> notes that "The ... tests use statistical analysis. These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through."It's not really specifically where the spaces are (spaces are just one part of the bit string, after all), it's more a matter of finding cases that the statistical tests give a (probably) incorrect answer. Although this specific man page doesn't mention it, these sorts of functions work significantly better on longer strings of text. For these short strings, it's legitimately statistically difficult to guess what encoding is used for the file.This is why including metadata that explicitly specifies the text encoding used is a very good thing.
interiotNov 17, 2006
On WinXP at least, Notepad.exe calls IsTextUnicode(). Microsoft's man page <a class="user" href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp</a> notes that "The ... tests use statistical analysis. These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through."It's not really specifically where the spaces are (spaces are just one part of the bit string, after all), it's more a matter of finding cases that the statistical tests give a (probably) incorrect answer. Although this specific man page doesn't mention it, these sorts of functions work significantly better on longer strings of text. For these short strings, it's legitimately statistically difficult to guess what encoding is used for the file.This is why including metadata that explicitly specifies the text encoding used is a very good thing.
celisynNov 17, 2006
Found this under the comments of a previous digg story. It's just a clearer explanation of what happens (pretty much what the comment above mine says).<a class="user" href="http://apipes.blogspot.com/2006/06/this-api-can-break.html">http://apipes.blogspot.com/2006/06/this-api-can-break.html</a>
deohieuNov 17, 2006
oh unfortunately - it's displayed correctly here... in digg... no chinese characters :(
newpunkNov 17, 2006
This bug isn't present in Vista. Coolness... but Vista is hurting, I need to switch back to XP Professional.
mush0010Nov 17, 2006
amazing concept and conclusion of the whole matter! I think i peed a little after reading about bisexual trees touching mountain dogs inappropriately.
nawinNov 17, 2006
This was already explained by me in JCXP.net..<a class="user" href="http://www.jcxp.net/forums/index.php?s=&showtopic=7966&view=findpost&p=117075">http://www.jcxp.net/forums/index.php?s=&showtopic=7966&view=findpost&p=117075</a>The only difference between this article and mine is, mine was not put up on digg..
steveluckyNov 18, 2006
how about "this site can break my browser"? this site crashes opera for me everytime. weird.
magicmarcNov 21, 2006
this was now bury.Will that work?
dandvNov 30, 2007
It's a UTF-16 encoding detection bug in Notepad and it happens for more phrases (see <a class="user" href="http://en.wikipedia.org/wiki/Bush_hid_the_facts).">http://en.wikipedia.org/wiki/Bush_hid_the_facts).</a>PS1: The original link in this digg article is dead.PS2: At the time I write this, Digg says there are 92 comments, but only 1 is visible, even after expanding the full tree.