
Doubled glyphs and recurring letters
The first nine lines of Canto 1 of La Divina Commedia, in the Foligno edition of 1472.
In previous articles on this platform, I have advanced the hypothesis that the scribes who wrote the text of the Voynich manuscript could have worked in a two-stage process, as follows:
- transcribing letters in the source documents to glyphs, word by word, according to a mapping prescribed by the producer;
- within each transcribed “word”, re-ordering the glyphs according to a “slot alphabet”, also prescribed by the producer.
I gave an example, drawn from Dante Alighieri’s La Divina Commedia, in which I applied such a process to a line of Italian text. This yielded a sequence of glyph strings in which the majority were real Voynich “words”.
In advancing this hypothesis, I have proposed that the re-ordering had to follow the transcription, and not precede it. My assumption is that the producer wanted the manuscript to be readable, but only by readers who possessed the mapping.
If the re-ordering preceded the transcription, the scribe would have to follow some systematic sequence that was applicable to the source language. We could think of this sequence as an alphabet. It could be the accepted alphabet of that language (which from the producer's viewpoint, would be the simplest instruction). In any case, the leftmost letters in each word would be among the first letters in that alphabet.
For example, the first line of La Divina Commedia, if the words are spelled as in the fifteenth century and written in full, is:
>
Let us suppose that the producer instructed a scribe to sort the letters in each word according to the Latin alphabet, retaining the case sensitivity. Thereby the line became:
>
The scribe then mapped this re-ordered text to glyphs.
Whatever the mapping from letters to glyphs might be, a reader might guess that in the fourth, sixth and seventh “words”, the leftmost glyph represented an “a”. Other glyphs might well follow, and the mapping would be cracked.
On the other hand, suppose that the producer gave the scribes a “slot alphabet” of his or her own devising, to be applied to the transcribed strings of glyphs. In this case, the ordinary reader could not guess the letters that the glyphs represented. The meaning of the manuscript would then be accessible only to the privileged possessors of the key. If that key were lost, the manuscript would become unreadable, and for, let us say, six hundred years, would remain so.
Mapping recurring letters
The reasonable question may be raised as to what the scribe would do when transcribing words in which a letter recurs – either as a doubled letter, or as a recurrence elsewhere in the word.
Again we may draw an example from La Divina Commedia, where in the 1472 edition, on the third line, we see the word “diricta” with two “i”s, and “smarrita” with two “a”s and a doubled “r”. If a scribe maps this line to a sequence of glyphs, and if the mapping is one-to-one, and if the glyphs are then re-ordered, then the two “i”s, the two “a”s and the two “r”s will each become a doubled glyph.
The question then becomes: does the Voynich manuscript contain enough doubled glyphs to represent letters that recur within words?
This is an empirical question. Our first step is to determine how often there are instances of recurring letters within words in natural languages. No doubt, the frequency of this phenomenon differs from one language to another. I have started, as is my wont, with medieval Italian as represented by La Divina Commedia.
One easy way to count letter recurrences is to take the text and run it through an online letter sorting application such as https://onlinetexttools.com/sort-letters-in-words. This process can be done with case-sensitivity (as I have done), or without. This results in a new text in which, for example, the first nine lines of La Divina Commedia in the 1472 edition are re-ordered as follows:
| original | letters sorted within words |
|---|---|
| Nel mezo del camin dinrã uita | Nel emoz del acimn dinrã aitu |
| mi trouai puna selua oscura, | im aiortu anpu aelsu acorsu, |
| che la diriπa uia era smarrita. | ceh al adiirπ aiu aer aaimrrst. |
| Et quanto a dir q”lera e cosa dura | Et anoqtu a dir q”aelr e acos adru |
| esta selua seluagia e aspra e forte | aest aelsu aaegilsu e aaprs e efort |
| che nel pensier rinoua la paura! | ceh eln eeinprs ainoru al aapru! |
| Tant e amara che pocho e piu morte; | Tant e aaamr ceh choop e ipu emort; |
| ma p traπar del ben ch i ui trouai, | am p aarrtπ del ben ch i iu aiortu, |
| diro de l altre cose ch i u ho scorte. | dior de l aelrt ceos ch i u ho ceorst. |
The next step is to count, in the re-ordered text, the occurrences of bigrams such as “aa” or trigrams such as “aaa”, using for example the “find” function in Microsoft Word. We can then identify the letters that recur within words, and how often they recur. In La Divina Commedia, the most common such letters are listed below.
From this exercise we see that of the 396,445 letters in La Divina Commedia, 76,932 letters (or 19.4 percent of the total) are letters which occur at least twice within a single word. These include 7,671 letters (or 1.9 percent) which occur three times or more within a single word.
We can now turn to the Voynich manuscript, and investigate whether its glyph counts can replicate this phenomenon.
v101
Of the conventional transliterations of the Voynich manuscript, Glen Claston’s v101 is my favorite, for its minimalist assumptions. In v101, there is a total of 158,940 glyphs by Claston’s definitions. Very few glyphs are explicitly doubled. There is only one that occurs as a double more than 100 times, and that is {c}; the string {cc} occurs 1,686 times. If we count all explicit occurrences of double or triple glyphs, they account for only 3,612 glyphs, or 2.3 percent of the total glyph count.
We are entitled then to ask whether the Voynich manuscript, thus defined, can account for recurring letters in natural languages.
I think that the way forward is to reassess Claston’s definitions of glyphs. I believe that Claston himself, in designating his transliteration as v101, intended and expected that other researchers would explore alternative definitions. Thereby they would create new transliterations, which they might number v102, v103 and so on.
To my mind, there are at least twenty v101 glyphs that invite redefinition. One of the most egregious examples is {m}, which occurs 4,112 times in the manuscript. The glyph {m} could be interpreted as a sting of three glyphs {iiN} (as in the EVA transliteration); this string contains the doubled glyph {ii}. In that case, here alone we have 4,112 instances of a doubled glyph.
And there are some common glyphs, notably {1}, {2} and the variants of {2}, which have a bifurcate structure and look like they might represent doubled or recurring letters in the presumed precursor languages.
Below is a selection of common v101 glyphs which look as if they could be redefined to represent or contain doubled glyphs.
If we redefine all of the glyphs listed in this table, the total glyph count of the manuscript increases to 184,676; and we have a count of 50,928 glyphs which are components of doubles or triples – equivalent to 27.6 percent of the total glyph count.
Purely on a raw glyph count, we can create a new transliteration in which the number of doubled and tripled glyphs is more than sufficient to account for recurring letters in medieval Italian.
Nevertheless, there must be more complexity in the mapping of recurring letters. The Voynich glyphs, if suitably redefined, allow us to find doubles and triples of {c} and {i), and possibly of some component of {1}. Italian, on the other hand, has four vowels that pervasively recur within words - namely a, e, i and o. This is largely a consequence of the gender and number endings of Italian nouns. Several Italian consonants - notably n, s, r and c - are often doubled, even in the fifteenth-century spelling which had less doubled letters than modern Italian.
It may follow that Italian is unlikely to be a substrate language of the Voynich manuscript.
As a counter-example, in medieval Arabic as represented by مقدمة (muqaddimah, or "Introduction") of Ibn Khaldun, written in 1377, letters which recur within words account for just 13.6 percent of the total letter count. In part, this is due to the absence of doubled letters in Arabic. A consonant which is to be pronounced as doubled can be marked with the ّ (shaddah), as in مُقَدِّـمَـة, but in practice the ّ is often omitted.
In Arabic, two letters in particular are prone to recur: ا (alef) and ل (lam). We might be encouraged to look for correspondences with the Voynich doubled glyphs {ii} and {cc}, on a suitable redefinition of the glyphs. As I mentioned in an earlier article on this platform, Arabic is one of the most encouraging languages in terms of mapping the top ten Voynich "words" to real words.
In any case, we need to ask to what extent letters recur within words in other medieval languages. To answer this question, we would need to apply the process that I outlined above, to representative documents in those languages. This is not a difficult task, and will be the subject of a future article on this platform.