u/FallenMaccaron

Hello one and all,

Anyone want to compare the Speller results from the Problem set 5?

I do not think that I am breaking any rules by posting the results only.

Basically kept chipping away, line by line, idea by idea, printeffing almost everything until I got it to work how I wanted (working at all, that is).
I have chosen to keep my hash function relatively simple (but just enough complicated) to get some results (although the design50 is telling me to research online - which is against what the assignment is saying).
Of course my favorite testing file was holmes.txt, with its 1.15 million words.

If anyone would like to compare the results, here are mine (tested with various N's, to see the difference in times).
holmes.txt with N = 26
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.04
TIME IN check:        203.65
TIME IN size:         0.00
TIME IN unload:       0.01
TIME IN TOTAL:        203.70

holmes.txt with N = 676 (26^2)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.03
TIME IN check:        7.80
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        7.85

holmes.txt with N = 17576 (26^3)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.04
TIME IN check:        1.35
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.42

holmes.txt with N = 35152 (26^3)*2
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.03
TIME IN check:        1.10
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.15

holmes.txt with N = 143091 (words in dictionary)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.05
TIME IN check:        0.99
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.06

holmes.txt with N = 456976 (26^4)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.05
TIME IN check:        0.95
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.02
Valgrind result:
==59758== 
==59758== HEAP SUMMARY:
==59758==     in use at exit: 0 bytes in 0 blocks
==59758==   total heap usage: 143,096 allocs, 143,096 frees, 8,023,256 bytes allocated
==59758== 
==59758== All heap blocks were freed -- no leaks are possible
==59758== 
==59758== For lists of detected and suppressed errors, rerun with: -s
==59758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Basically, with my code, the main factor was the amount of hash table buckets - how fast was the check function in finding the word.
I am not smart enough to say that increasing the amount of buckets by 100k to gain 0.1 seconds is advisable, but hey - I tried it.

It would also be interesting to see what would happen if a trie was implemented. I assume the amount of memory would explode (each letter would have 26 or 27 nodes for the ' ). But would it be faster?

PS5 - Speller Assignment Comparison

Anybody mind reviewing my Scratch submission?