{"id":450,"date":"2011-11-10T23:17:07","date_gmt":"2011-11-10T22:17:07","guid":{"rendered":"http:\/\/www.nobugs.org\/blog\/?p=450"},"modified":"2011-11-10T23:21:04","modified_gmt":"2011-11-10T22:21:04","slug":"the-difficulties-and-dangers-of","status":"publish","type":"post","link":"https:\/\/www.nobugs.org\/blog\/archives\/2011\/11\/10\/the-difficulties-and-dangers-of\/","title":{"rendered":"there is a great difference"},"content":{"rendered":"<p>For some reason it occurred to me tonight to find the longest common substring in various classical novels.<\/p>\n<p>There&#8217;s a bunch of interesting algorithmics involved here.  The naive O(n<sup>3<\/sup>) approach is slower than a slow snail on a slow day.  Dynamic programming gets you to an easy-to-understand O(n<sup>2<\/sup>) solution, but that&#8217;s still impractically slow for story-length strings.  Things start getting really clever with <a href=\"http:\/\/en.wikipedia.org\/wiki\/Ukkonen%27s_algorithm\">Ukkonen&#8217;s algorithm<\/a> and now we&#8217;re into practical runtime territory.  You could get a constant-factor speedup by working on sequences of intern&#8217;d words rather than chars, but the above algorithm runs in under a minute as-is so I didn&#8217;t bother. <\/p>\n<p>I downloaded and normalised some books from Project Gutenberg and ran my longest-common-substring on them.<\/p>\n<ul>\n<li>Emma vs Pride and Prejudice:  &#8220;when the ladies returned to the drawing room&#8221;<\/li>\n<li>Dracula vs Frankenstein: &#8220;between two and three in the morning&#8221;<\/li>\n<li>Dracula vs Pride and Prejudice: &#8220;but there was much to be talked of&#8221;<\/li>\n<li>Dracula vs. Fiat Money Inflation in France: &#8220;the difficulties and dangers of&#8221;<\/li>\n<li>Treasure Island vs Dracula: &#8220;threw himself on his knees and held&#8221;<\/li>\n<li>Treasure Island vs Jekyll &#038; Hyde: &#8220;and to make assurance doubly sure&#8221;<\/li>\n<li>US Constitution vs Dracula: &#8220;to the best of my ability&#8221;<\/li>\n<\/ul>\n<p>.. and last, but most self-referentially not least &#8230;<\/p>\n<ul>\n<li>Emma vs Frankenstein: &#8220;there is a great difference&#8221;.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>For some reason it occurred to me tonight to find the longest common substring in various classical novels. There&#8217;s a bunch of interesting algorithmics involved here. The naive O(n3) approach is slower than a slow snail on a slow day. Dynamic programming gets you to an easy-to-understand O(n2) solution, but that&#8217;s still impractically slow for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-450","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/comments?post=450"}],"version-history":[{"count":7,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/450\/revisions"}],"predecessor-version":[{"id":457,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/posts\/450\/revisions\/457"}],"wp:attachment":[{"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/media?parent=450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/categories?post=450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nobugs.org\/blog\/wp-json\/wp\/v2\/tags?post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}