Codebooks Revisited

1. Codebooks

Let's start by looking at a Royal Navy codebook of WWII vintage - British Cypher No. 5. (This replaced the infamous No.3 that had been broken and resulted in so much damage to the Atlantic convoys. See page 196 of Ralph Bennett's 'Behind the Battle' (Sinclair-Stevenson, London, 1994): ' . . . but when on 10 June [1943] the Admiralty at last replaced Naval Cipher No. 3 with No. 5, which proved quite secure, it was plain that the U- boats could never regain their former authority.')

The book comprises a list with space for 10,000 codegroups that represent:

The book contains a worked example in which a message that in ASCII would take 104 bytes, takes 27 codegroups. This approximates to a 1.9:1 text compression, assuming that each symbol takes 14 bits and each letter takes 7 bits. Also, due to multiple choices, there is only one repeated codegroup.

2. Predictive Word Processors

If you scan the web, you'll find a number of Predictive Word Processors that are currently available. These have found a niche market with disabled people who find typing difficult. They could be useful in military and other rugged applications such as message entry in arctic conditions where thick gloves are essential.

In an early example (Mindreader from Brown Bag Software), as each word is being typed, a menu of options is displayed that offers a single numerical key option to complete the word. These options would normally be based on the statistics of word usage but other factors can considered such as context related jargon (cf military ranks, equipment etc).

3. RC4 (so-called Rivest Cipher no. 4)

The basis of this PRNG is a 256 byte table that is transposed every time a random byte is generated. The transposition consists of:

        x = (x + 1) % 256
        y = (y + t[x]) % 256
        swap t[x], t[y]

where x and y are indices to the table t. The table has no repeat entries ie it is a 1 to 1 mapping and so it can be used as a substitution table (S-box) and also as a transposition or permutation table.

Variants - the first line can be modified to:

        x = (x + z) % 256

where z is an odd integer. This effectively remaps the table - but the original characteristics are unchanged.

Also, the second line can be changed to:

        y = (y + t[x ^ y]) % 256

which makes the algorithm very difficult to run backwards. However, the effects of using this in a PRNG are as yet unknown and it should be noted that irreversible algorithms do make cryptanalysis easier.

To initialise the table, firstly each entry is set to its index value and then the key is applied so:

        x = (x + 1) % 256
        y = (y + t[x] + k[ki]) % 256
        swap t[x], t[y]
        ki = (ki +1) % kl

where ki is the index to the key array k and kl is the key length. This sequence is repeated 256 times - although some commentators find this too few. A good compromise might be to repeat the above 512 times then another 256 times using the irreversibility technique, ie with line 2 now:

        y = (y + t[x ^ y] + k[ki]) % 256

4. The One-Time Codebook

If 1., 2. and 3. are linked together, using a common dictionary, there are some advantages. One is a reduction in storage for messages by feeding the operator preferred options.

However, the main advantage is the ability to create a One-Time Codebook. The main feature of this would be that the placode would consist of a series of randomised codegroups with very few repeats.

Let's assume that the messages generally have a maximum length that is conducive with the number of homophones. (ie the word 'to' shouldn't appear more than 15 times in the message say) Then we can scan the message and reduce repeat codegroups by various techniques such as spelling out repeat words, tagging codegroups that are repeated, tagging codegroups to be arithmetically modified and using codegroups for 'repeat codegroup x'.

Confusion can be added by including codegroups that mean 'ignore the next codegroup' and a variety of null codegroups. (cf the Rossignols' 'Great Cipher')

Finally the codebook is randomised (driven by a one-time session key) using a substitution table and a transposition table - hence the One-Time Codebook notion. It appears as if a new codebook is created for each and every message.

Let's assume a codebook with up to 65536 entries. This means a 65536 by 16-bit S-box. To create the S-box, we use the RC4 approach of starting with an initialised table, then the S-box is permuted under the control of a session key.

Again, using a RC4 approach, but with the table the same size as the number of codegroups in the message, we produce a permutation table from a second session key.

This approach to encryption has some obvious disadvantages in terms of resources required: a 131072 byte S-box, a dictionary of possibly over 500k bytes and the time taken to randomise the S-box. However, this approach does have cryptanalytical strengths in that the symbol sequences are more random than in a letter based system. Superencypherment would no doubt be added for very secure applications.

According to Kahn, (The Codebreakers - section on the Zimmermann Telegram), codebooks are broken by looking for patterns generated by punctuation (full stops) and knowledge of the structure of sentences in terms of nouns and verbs - along with a large volume of traffic. The use of transposition and substitution controlled by one-time session keys should effectively hide these from the analyst.

A simple paper version can be created using a word list. Create one file with each word at the beginning of a line, pad the line with spaces to column 60 (say) and then add the relevant number from the RC4 table. Sort the list using column 1 if necessary. Create the second file with the number first then the padding and finally the word. Again, sort from column 1.


© November 1999, Keith Lockstone

Comments welcome:  Keith

Back to home page crypto,cryptography,codebook,one-time codebook,czczcz, predictive word processor,predictive word processors