u/FallenMaccaron

▲ 0 r/cs50

PS5 - Speller Assignment Comparison

Hello one and all,

Anyone want to compare the Speller results from the Problem set 5?

I do not think that I am breaking any rules by posting the results only.

Basically kept chipping away, line by line, idea by idea, printeffing almost everything until I got it to work how I wanted (working at all, that is).
I have chosen to keep my hash function relatively simple (but just enough complicated) to get some results (although the design50 is telling me to research online - which is against what the assignment is saying).
Of course my favorite testing file was holmes.txt, with its 1.15 million words.

>

If anyone would like to compare the results, here are mine (tested with various N's, to see the difference in times).
holmes.txt with N = 26
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.04
TIME IN check:        203.65
TIME IN size:         0.00
TIME IN unload:       0.01
TIME IN TOTAL:        203.70

holmes.txt with N = 676 (26^2)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.03
TIME IN check:        7.80
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        7.85

holmes.txt with N = 17576 (26^3)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.04
TIME IN check:        1.35
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.42

holmes.txt with N = 35152 (26^3)*2
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.03
TIME IN check:        1.10
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.15

holmes.txt with N = 143091 (words in dictionary)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.05
TIME IN check:        0.99
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.06

holmes.txt with N = 456976 (26^4)
WORDS MISSPELLED:     17845
WORDS IN DICTIONARY:  143091
WORDS IN TEXT:        1150970
TIME IN load:         0.05
TIME IN check:        0.95
TIME IN size:         0.00
TIME IN unload:       0.02
TIME IN TOTAL:        1.02
Valgrind result:
==59758== 
==59758== HEAP SUMMARY:
==59758==     in use at exit: 0 bytes in 0 blocks
==59758==   total heap usage: 143,096 allocs, 143,096 frees, 8,023,256 bytes allocated
==59758== 
==59758== All heap blocks were freed -- no leaks are possible
==59758== 
==59758== For lists of detected and suppressed errors, rerun with: -s
==59758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Basically, with my code, the main factor was the amount of hash table buckets - how fast was the check function in finding the word.
I am not smart enough to say that increasing the amount of buckets by 100k to gain 0.1 seconds is advisable, but hey - I tried it.

It would also be interesting to see what would happen if a trie was implemented. I assume the amount of memory would explode (each letter would have 26 or 27 nodes for the ' ). But would it be faster?

reddit.com
u/FallenMaccaron — 2 days ago