u/Technical_Bar6829

Doubled glyphs and recurring letters

Doubled glyphs and recurring letters

The first nine lines of Canto 1 of La Divina Commedia, in the Foligno edition of 1472.

In previous articles on this platform, I have advanced the hypothesis that the scribes who wrote the text of the Voynich manuscript could have worked in a two-stage process, as follows:

  • transcribing letters in the source documents to glyphs, word by word, according to a mapping prescribed by the producer;
  • within each transcribed “word”, re-ordering the glyphs according to a “slot alphabet”, also prescribed by the producer.

I gave an example, drawn from Dante Alighieri’s La Divina Commedia, in which I applied such a process to a line of Italian text. This yielded a sequence of glyph strings in which the majority were real Voynich “words”.

In advancing this hypothesis, I have proposed that the re-ordering had to follow the transcription, and not precede it. My assumption is that the producer wanted the manuscript to be readable, but only by readers who possessed the mapping.

If the re-ordering preceded the transcription, the scribe would have to follow some systematic sequence that was applicable to the source language. We could think of this sequence as an alphabet. It could be the accepted alphabet of that language (which from the producer's viewpoint, would be the simplest instruction). In any case, the leftmost letters in each word would be among the first letters in that alphabet.

For example, the first line of La Divina Commedia, if the words are spelled as in the fifteenth century and written in full, is:

>

Let us suppose that the producer instructed a scribe to sort the letters in each word according to the Latin alphabet, retaining the case sensitivity. Thereby the line became:

>

The scribe then mapped this re-ordered text to glyphs.

Whatever the mapping from letters to glyphs might be, a reader might guess that in the fourth, sixth and seventh “words”, the leftmost glyph represented an “a”. Other glyphs might well follow, and the mapping would be cracked.

On the other hand, suppose that the producer gave the scribes a “slot alphabet” of his or her own devising, to be applied to the transcribed strings of glyphs. In this case, the ordinary reader could not guess the letters that the glyphs represented. The meaning of the manuscript would then be accessible only to the privileged possessors of the key. If that key were lost, the manuscript would become unreadable, and for, let us say, six hundred years, would remain so.

Mapping recurring letters

The reasonable question may be raised as to what the scribe would do when transcribing words in which a letter recurs – either as a doubled letter, or as a recurrence elsewhere in the word.

Again we may draw an example from La Divina Commedia, where in the 1472 edition, on the third line, we see the word “diricta” with two “i”s, and “smarrita” with two “a”s and a doubled “r”. If a scribe maps this line to a sequence of glyphs, and if the mapping is one-to-one, and if the glyphs are then re-ordered, then the two “i”s, the two “a”s and the two “r”s will each become a doubled glyph.

The question then becomes: does the Voynich manuscript contain enough doubled glyphs to represent letters that recur within words?

This is an empirical question. Our first step is to determine how often there are instances of recurring letters within words in natural languages. No doubt, the frequency of this phenomenon differs from one language to another. I have started, as is my wont, with medieval Italian as represented by La Divina Commedia.

One easy way to count letter recurrences is to take the text and run it through an online letter sorting application such as  https://onlinetexttools.com/sort-letters-in-words. This process can be done with case-sensitivity (as I have done), or without. This results in a new text in which, for example, the first nine lines of La Divina Commedia in the 1472 edition are re-ordered as follows:

original letters sorted within words
Nel mezo del camin dinrã uita Nel emoz del acimn dinrã aitu
mi trouai puna selua oscura, im aiortu anpu aelsu acorsu,
che la diriπa uia era smarrita. ceh al adiirπ aiu aer aaimrrst.
Et quanto a dir q”lera e cosa dura Et anoqtu a dir q”aelr e acos adru
esta selua seluagia e aspra e forte aest aelsu aaegilsu e aaprs e efort
che nel pensier rinoua la paura! ceh eln eeinprs ainoru al aapru!
Tant e amara che pocho e piu morte; Tant e aaamr ceh choop e ipu emort;
ma p traπar del ben ch i ui trouai, am p aarrtπ del ben ch i iu aiortu,
diro de l altre cose ch i u ho scorte. dior de l aelrt ceos ch i u ho ceorst.

The next step is to count, in the re-ordered text, the occurrences of bigrams such as “aa” or trigrams such as “aaa”, using for example the “find” function in Microsoft Word. We can then identify the letters that recur within words, and how often they recur. In La Divina Commedia, the most common such letters are listed below.

Counts of doubled or tripled letters in La Divina Commedia, after alphabetic sorting of letters within words. Totals refer to all recurring letters, including those not listed in this extract.

From this exercise we see that of the 396,445 letters in La Divina Commedia, 76,932 letters (or 19.4 percent of the total) are letters which occur at least twice within a single word. These include 7,671 letters (or 1.9 percent) which occur three times or more within a single word.

We can now turn to the Voynich manuscript, and investigate whether its glyph counts can replicate this phenomenon.

v101

Of the conventional transliterations of the Voynich manuscript, Glen Claston’s v101 is my favorite, for its minimalist assumptions. In v101, there is a total of 158,940 glyphs by Claston’s definitions. Very few glyphs are explicitly doubled. There is only one that occurs as a double more than 100 times, and that is {c}; the string {cc} occurs 1,686 times. If we count all explicit occurrences of double or triple glyphs, they account for only 3,612 glyphs, or 2.3 percent of the total glyph count.

We are entitled then to ask whether the Voynich manuscript, thus defined, can account for recurring letters in natural languages.

I think that the way forward is to reassess Claston’s definitions of glyphs. I believe that Claston himself, in designating his transliteration as v101, intended and expected that other researchers would explore alternative definitions. Thereby they would create new transliterations, which they might number v102, v103 and so on.

To my mind, there are at least twenty v101 glyphs that invite redefinition. One of the most egregious examples is {m}, which occurs 4,112 times in the manuscript. The glyph {m} could be interpreted as a sting of three glyphs {iiN} (as in the EVA transliteration); this string contains the doubled glyph {ii}. In that case, here alone we have 4,112 instances of a doubled glyph.

And there are some common glyphs, notably {1}, {2} and the variants of {2}, which have a bifurcate structure and look like they might represent doubled or recurring letters in the presumed precursor languages.

Below is a selection of common v101 glyphs which look as if they could be redefined to represent or contain doubled glyphs.

Author’s subjective selection of glyphs which have the visual appearance of representing or containing doubled or tripled glyphs. Author's analysis, based on v101 transliteration. Frequency is calculated as count of glyphs, divided by total glyph count. Totals include some additional glyphs not listed in this table.

If we redefine all of the glyphs listed in this table, the total glyph count of the manuscript increases to 184,676; and we have a count of 50,928 glyphs which are components of doubles or triples – equivalent to 27.6 percent of the total glyph count.

Purely on a raw glyph count, we can create a new transliteration in which the number of doubled and tripled glyphs is more than sufficient to account for recurring letters in medieval Italian.

Nevertheless, there must be more complexity in the mapping of recurring letters. The Voynich glyphs, if suitably redefined, allow us to find doubles and triples of {c} and {i), and possibly of some component of {1}. Italian, on the other hand, has four vowels that pervasively recur within words - namely a, e, i and o. This is largely a consequence of the gender and number endings of Italian nouns. Several Italian consonants - notably n, s, r and c - are often doubled, even in the fifteenth-century spelling which had less doubled letters than modern Italian.

It may follow that Italian is unlikely to be a substrate language of the Voynich manuscript.

As a counter-example, in medieval Arabic as represented by مقدمة (muqaddimah, or "Introduction") of Ibn Khaldun, written in 1377, letters which recur within words account for just 13.6 percent of the total letter count. In part, this is due to the absence of doubled letters in Arabic. A consonant which is to be pronounced as doubled can be marked with the ّ (shaddah), as in مُقَدِّـمَـة, but in practice the ّ is often omitted.

In Arabic, two letters in particular are prone to recur: ا (alef) and ل (lam). We might be encouraged to look for correspondences with the Voynich doubled glyphs {ii} and {cc}, on a suitable redefinition of the glyphs. As I mentioned in an earlier article on this platform, Arabic is one of the most encouraging languages in terms of mapping the top ten Voynich "words" to real words.

In any case, we need to ask to what extent letters recur within words in other medieval languages. To answer this question, we would need to apply the process that I outlined above, to representative documents in those languages. This is not a difficult task, and will be the subject of a future article on this platform.

reddit.com
u/Technical_Bar6829 — 8 days ago

The Voynich transcription

A scribe imagined, transcribing lines from Dante's \"La Divina Commedia2.

Here is an example of how I imagine the transcription of a medieval document, which resulted in the book that we now know as the Voynich manuscript.

Let us suppose, for the sake of argument, that:

  • a scribe receives Canto 1 of Dante Alighieri's La Divina Commedia, with instructions to transcribe it to the symbols that we now call Voynich glyphs;
  • the principal glyphs are defined as Glen Claston will define them six hundred years later, with the exception that the symbol that Claston will call {4o} is not two glyphs but one;
  • the Italian words are written in full, without the abbreviations and concatenations of the Foligno edition;
  • the producer has prescribed a one-to-one mapping of Latin letters to glyphs, either not knowing or not caring that this mapping will preserve the frequencies of the Latin letters;
  • it follows, with some confidence, that each Latin letter maps approximately to the equally ranked glyph; for example e to {o}, a to {9}, i to {a}, and so on.

In this scenario, the scribe examines the first line of Canto 1, which consists of seven words:

>nel mezo del camin di nostra uita

and following the mapping that the producer has laid down, he writes the rough transcription as shown below:

https://preview.redd.it/7j7o3pbgug0h1.png?width=806&format=png&auto=webp&s=6559a4035dc2dcc795c0fc2d7add6925c373f00a

He then refers to the producer’s “slot alphabet” for the correct order in which the glyphs must be written. We do not know whether this alphabet was simple or complex; nor whether it was rigid or flexible. We might guess that it embodied rules of the following nature:

  • If the "word" contains the glyph {4o}, write that glyph in the leftmost position.
  • If the "word" contains the glyph {m}, write that glyph in the rightmost position.
  • If the "word" contains the glyph {9}, write that glyph in the rightmost position.
  • The glyphs {c} and {C} can be to the right of, but not to the left of, the glyphs {h} and {k}.

The scribe's clean copy, which he writes on the vellum, is as shown below:

https://preview.redd.it/vqegga4kug0h1.png?width=806&format=png&auto=webp&s=05894d2f506245dd69db7e37a626c85d99c96db9

Five of these seven “words” are real “words” in the Voynich manuscript; and the other two "words" differ by only one glyph from real Voynich "words".

In practice, we have no reason to believe that the source documents included La Divina Commedia, or even that they were in medieval Italian. My working assumption is that they were in languages that were spoken and written in Europe in the fifteenth century.

However, this exercise demonstrates that a one-to-one mapping of letters to glyphs, coupled with some kind of re-ordering process, can replicate real Voynich "words".

I think that the way forward is to try many candidate languages, and many alternative transliterations of the Voynich manuscript, with all the permutations that this will involve. This is not a manual task; it will necessarily be a massive computational approach. It was just such an approach that cracked the Zodiac cipher.

reddit.com
u/Technical_Bar6829 — 11 days ago

The Voynich instructions

https://preview.redd.it/acxvz07al90h1.png?width=2100&format=png&auto=webp&s=481d7d1b5f1e9bf718610db169ee3ffdb2cbbfcb

Here's an example of how I imagine a wealthy person, who lived in the fifteenth century, and whom I call the producer. I have imagined the producer as a man; equally, it could be a woman. The scene is a mansion or palace somewhere in southern Europe. The producer is giving instructions to a team of scribes.. Later, the scribes will write the text of the book that we now know as the Voynich manuscript.

As I imagine it, the producer engages at least five scribes. They are professional piece workers, paid by the page. The producer offers them a job which will keep them busy for at least several months, possibly more than a year.

The producer shows the scribes a set of symbols or glyphs that they have never seen before. It does not faze them; they are probably accustomed to writing in Latin or Greek script, possibly Arabic or Glagolitic.

He (or she) also provides a document, or a set of documents, containing some 40,000 words of text in a language and a script that they know. Perhaps he (or she) is the author; perhaps someone else is the author. They may even be well-known documents. It does not matter.

The task of the scribes will be to transcribe the source documents into the unknown alphabet. The producer sets out a set of rules and instructions for the transcription.

For this purpose, the producer will provide over one hundred sheets of calfskin vellum, already profusely illustrated on both recto and verso. The scribes will write the text, above, below and around the illustrations.

As I imagine, there is one specific instruction. Within each word, the scribes are first to map the letters to the glyphs, and then to re-order the glyphs in a sequence that the producer prescribes. This will create the phenomenon that, six centuries later, Mary D’Imperio will call "the five states", and Massimiliano Zattera will call "the slot alphabet".

If that is so, and if the producer intended the manuscript to be readable, he (or she) introduced an element of ambiguity into the decipherment. For example, the source documents might contain words that were anagrams of each other, like "trovai" and "travio" in Italian. Such words might map to the same string of glyphs.

Perhaps the producer expected that the reader, possessing or deducing the mapping between letters and glyphs, would infer the right meaning from the context. It would not be much harder than deciphering the following line:

* eln emoz acdeilmn ãdinr aitu

which readers of fifteenth-century Tuscan-Italian would intuit as:

* nel mezo delcamin dinrã uita,

or the first line of Dante Alighieri's La Divina Commedia, in the Foligno edition of 1472.

reddit.com
u/Technical_Bar6829 — 12 days ago