PS5 - Speller Assignment Comparison
Hello one and all,
Anyone want to compare the Speller results from the Problem set 5?
I do not think that I am breaking any rules by posting the results only.
Basically kept chipping away, line by line, idea by idea, printeffing almost everything until I got it to work how I wanted (working at all, that is).
I have chosen to keep my hash function relatively simple (but just enough complicated) to get some results (although the design50 is telling me to research online - which is against what the assignment is saying).
Of course my favorite testing file was holmes.txt, with its 1.15 million words.
>
If anyone would like to compare the results, here are mine (tested with various N's, to see the difference in times).
holmes.txt with N = 26
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.04
TIME IN check: 203.65
TIME IN size: 0.00
TIME IN unload: 0.01
TIME IN TOTAL: 203.70
holmes.txt with N = 676 (26^2)
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.03
TIME IN check: 7.80
TIME IN size: 0.00
TIME IN unload: 0.02
TIME IN TOTAL: 7.85
holmes.txt with N = 17576 (26^3)
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.04
TIME IN check: 1.35
TIME IN size: 0.00
TIME IN unload: 0.02
TIME IN TOTAL: 1.42
holmes.txt with N = 35152 (26^3)*2
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.03
TIME IN check: 1.10
TIME IN size: 0.00
TIME IN unload: 0.02
TIME IN TOTAL: 1.15
holmes.txt with N = 143091 (words in dictionary)
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.05
TIME IN check: 0.99
TIME IN size: 0.00
TIME IN unload: 0.02
TIME IN TOTAL: 1.06
holmes.txt with N = 456976 (26^4)
WORDS MISSPELLED: 17845
WORDS IN DICTIONARY: 143091
WORDS IN TEXT: 1150970
TIME IN load: 0.05
TIME IN check: 0.95
TIME IN size: 0.00
TIME IN unload: 0.02
TIME IN TOTAL: 1.02
Valgrind result:
==59758==
==59758== HEAP SUMMARY:
==59758== in use at exit: 0 bytes in 0 blocks
==59758== total heap usage: 143,096 allocs, 143,096 frees, 8,023,256 bytes allocated
==59758==
==59758== All heap blocks were freed -- no leaks are possible
==59758==
==59758== For lists of detected and suppressed errors, rerun with: -s
==59758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Basically, with my code, the main factor was the amount of hash table buckets - how fast was the check function in finding the word.
I am not smart enough to say that increasing the amount of buckets by 100k to gain 0.1 seconds is advisable, but hey - I tried it.
It would also be interesting to see what would happen if a trie was implemented. I assume the amount of memory would explode (each letter would have 26 or 27 nodes for the ' ). But would it be faster?