Suprisingly Hard Classical Systems
TLDR; I had some fun playing around with combining columnar transposition with substitution, which turns out to be surprisingly strong. Almost certainly not "properly strong", but not trivial to break with computers either. I was really surprised by this and would love to heart your ideas how this might be attacked more properly.
I'm a physicist with background in stats & ML and an interest in cryptography and I got nerdsniped pretty hard a while ago by this quote on wikipedia
> For example, a simple substitution cipher combined with a columnar transposition avoids the weakness of both.
I've always found doulbe columnar transposition pretty interesting because Lasry's 2014 paper came out while I was learning about crypto in school. As far as I can tell, outside of special cases, the methods from that paper and their 2016 follow up paper are still reasonably representative of the state of the art. Though scoring functions and key generation have improved.
After seeing that quote on wikipedia I decided to play around with the problem of adding a substitution step such that:
- the 2016 methods fail using a PC and one day attack time
- the method can be carried out by hand using no aids that can't be created on the fly
- the cipher text passes the NIST STS
- residues under perturbation of plain text or keys pass NIST STS
And I think I succeeded! Though just to be clear, I'm not claiming the resulting method has any practical meaning, it's likely breakable using HPC or if experts ever decided to invest lots of work into it. It was purely a toy project.
To make the substitution nicely compatible with the classical "Doppelwuerfel" it needed to use a q-ary alphabet instead of binary, so I also needed to adapt the NIST STS to F_q with a null hypothesis of x_i ~ U[q] instead of F_2 and U[{0,1}].
A lot of the properties checked by the STS are only going to be provided by the substitution, in particular frequency and cascade behavior. So it was fairly obvious that the method needed to be stateful, keyed and non-linear. The simplest possible thing I could come up with was
[ y_i = (k[y_{i-1}] + x_i + y_{i-1})^n ]
where gcd(n, q-1) = 1, and k[j] means the j-th letter of the substitution keyword and F_q* is used for the alphabet. This turned out to be fairly doable by hand using a power table created on the fly. In order for perturbations to propagate forward and backward this pre-whitening was applied to the text twice, once in forward once in backward direction.
This pre-whitening for q = 58, n = 3 was actually sufficient to fullfil my goals! 58 was chosen because it was the smallest prime q with gcd(q-1, 3)=1 with q-1 > 2*26 (I wanted upper and lower plus some special characters).
After running the STS tests I implemented the methods as described in the papers mentioned above and ran them against pure double columnar transposition, which gave a sufficient partial solve within an hour using 2 keys of length ~20 and ~1000 characters of text. So I could be reasonably sure that my implementation was not completely terrible.
I then adapted the attack to the whitened text by assuming the attacker knows the whitening key and margninalising the information over the q^2 hidden systes for each n-gram that is scored. I also tested straight decryption of the candidate permutation. Both of which failed to produce a partial solve in one day time. It's pretty clear that the hidden states obscure one character worth of information each, but just to be sure I computed the table of all 2-grams and a sample of all 3-grams with all q^2 hidden states to check that 0 or 1 character worth of information get leaked, which was the case. So the loss landscape for the hill climb because much less informative. As an alternative to marginalizing over the states one can guess and n+2-gram in order to score the central n-gram. But that turned out to not be tractable for 4-grams on my hardware. I also tested NN based scoring for language fragments, but out of lazyness used llama 3.2 1B which worked for straight Doppelwuerfel, but was slower than the classical method and failed the same for the whitened one.
I was honestly surprised that it was that easy to beat those methods, I expected the raw power of modern GPUs (RTX 4080 in my case) to be sufficient, but I suspect this whitening is actually sufficient to make the loss landscape awkwardly noisy. I'm not sure how I could prove that though. I'm also almost certain that with better methods it's breakable. If you have ideas I'd love to try them out!