2007-09-25

Ketchup #3: Another week, another cs problem

Well this week and weekend has flown by again.. with lots of things I keep saying "Oooh I should write that down" and then by the time I get to a computer.. go "What was that?" or "Crap I need to get that server back up."

So this weekend, we adopted the dog we had babysat a weekend ago. The family came down on Sunday, and dropped off KC who was bounding out of the car and went to check that her chew toy was still in the kid's room. (It was.) It has been walks 2x a day and getting myself used to a schedule of getting up early enough to get here up the trails and me back before I am late to work. So far this week it is Days On Time: 1, Days Off Time: 1. Will see how that progresses.

We got our second large programming assignment this week. It is basically a specialized stenography problem. Take a dictionary, and find the words that hide the word that is given to you. Make sure it covers all cases, make it as optimal as possible. So I spent a lot of the weekend and last night looking over various Java data structures that would help me accomplish this in the most optimized lookup time. The professor upped the anty yesterday by adding a prize to the person who gets the top 3 spots, having to beat him and the T.A. also. So far the lookup costs look to be:

  1. comparing the Given Word (GW) against any given Dictionary Word (DW). Given Word has a length of M and Dictionary Word needs to be at least M in size. The easiest solution would be to loop over the letters in GW and see if they exist in DW taking in account of duplicate letters.
  2. finding words that would make good guesses to compare against versus going through the dictionary word by word to find a choice.
My insane option was to create a bitmap of every word in the dictionary to make good guesses against. Since we are dealling with UTF-16 characters and I want to see if a letter resides in it.. the key would be a 65536 'bit' string, the data would be a tree of words meeting that mask. A word would be converted into said bit map ( mapper would be something 1 for a, 1 for m, 1 for p, 1 for e , 1 for the r 'bit') and then do a bit against that and the query word so that pam, ampre , etc would look down that shorter list of words for possible matches. The word mappp would also go to that queue but would not match against mapper but might match against something else in the queue. However, as I said it was an insane idea.. I have to come up with something else.

I did look up trie (short for retrieval) trees as they are used a lot in dictionary lookups but that doesnt seem to meet the goal as it seems to be better for ordered lookups versus anagram like lookups. Will post more later.. back to a server that died.