A CLASS OF STEGANOGRAPHIC PROTOCOLS

Minimum Protocols for the Insertion of Messages
into Random or Pseudorandom Data

(Posted to sci.crypt.research, 29th Nov 1994)

[The term STEGOTEXT has been coined to denote the result of a Steganographic process - cf Ciphertext]

1. INTRODUCTION

Steganography is the art of hiding one message in another. In this note, the host message is defined as a block of random data - available to both sender and receiver. This, arguably, also classifies the following schemes as varieties of stream cipher.

One use of this class could be to foil traffic analysis - the sender can automatically queue messages, randomise the apparent message rate and adjust the actual message rate accordingly. This assumes sufficient bandwidth to cope with messaging requirements.

2. SCOPE

These comments apply to a group of protocols that embed a message, one bit at a time, in a block of (pseudo)random data. Implicit in these schemes is that both the sender and receiver have copies of, or can synthesise, the random data stream.

The message should be compressed and then ciphered using standard techniques before the steganography is applied. The more random the message appears, the more difficult the attack will be. The requirement for message randomness may be fulfilled by a relatively weak cipher in this context.

It is also assumed that the ratio of message to random data is quite low ie less than 1:10.

3. GENERAL

The more protocol information embedded in a stegotext the more danger there is of a compromise of the message.

The sender must be able to vary certain parameters of the stegotext such as the message to random data ratio. This imposes limits on any implied protocols.

Factors effecting these parameters could be message randomness, the randomness of the 'random data', storage capacity, transmission channel capacity, the time taken for transmission and maintaining constant or confusing levels of traffic.

If the random data source is not random ie it is a distinct message such as a picture or a sound file, then the requirement would be to only modify certain bits in the file. An implicit protocol may use (say) the least significant bit of a 24-bit graphics file. This bit is probably a 'noise' bit anyway, but the protocol would have to use every least significant bit. The protocols described here allow the sender to randomly determine which bits are used.

Where there is a high random data to message ratio, the random data is totally exposed to analysis. Thus cryptographically very strong pseudorandom generators must be used.

The apparent randomness of the stegotext is probably the most important overall factor.

4. PROTOCOLS

In the following descriptions, it is assumed that there is an independent process that randomly moves a pointer along the random data to select the next protocol event ie where the next message bit is to be insinuated. If the selected point is not suitable, then the pointer will be moved along incrementally until conditions are suitable.

The algorithm for moving the pointer is constrained by the needs of the message to random data ratio.

The following examples attempt to explore some possibilities and are presented in an evolutionary order ie that of increasing complexity.

4.1 Bit Inversion

In this technique, a bit of particular value in the random data is chosen and then inverted. Ie a message bit of '1' is implied if a '0' is turned into a '1' and vice versa:

    1:      R1, R2, 0, R4, R5 -> R1, R2, 1, R4, R5
    0:      R1, R2, 1, R4, R5 -> R1, R2, 0, R4, R5
                    ^                    ^

The message is easily extracted by comparing the random data with the stegotext and noting every bit that is different plus its value.

4.2 Bit Insertion

Here, the bit at the insertion point is chosen to be of opposite value to the message bit. The message bit is inserted just prior to the selected bit:

    1:      R1, R2, 0, R4, R5 -> R1, R2, 1, 0, R4, R5
    0:      R1, R2, 1, R4, R5 -> R1, R2, 0, 1, R4, R5
                    ^                    ^  ^

Again, the message is easily extracted by comparing the random data with the stegotext and noting where a bit is different plus its value. The stegotext is then shifted by one bit to regain alignment and the comparison is continued to find the next bit.

4.3 Bit Deletion

The bits at the deletion point are chosen to be either a '01' or a '10' pair, depending on the value of the message bit. In the case of a '1', a '10' pair is chosen and the '1' of the pair is deleted:

    1:      R1, R2, 1, 0, R5, R6 -> R1, R2, 0, R5, R6
    0:      R1, R2, 0, 1, R5, R6 -> R1, R2, 1, R5, R6
                    ^  ^                    ^

The message is easily extracted by comparing the random data with the stegotext and noting where a bit is different plus its value. The stegotext is then shifted by one bit to regain alignment and the comparison is continued to find the next bit.

4.4 Flag Bit

Here, the selected bit is inverted, but with the proviso that the following bit coincides with the message bit:

    1:      R1, R2, R3, 1, R5 -> R1, R2, ~R3, 1, R5
    0:      R1, R2, R3, 0, R5 -> R1, R2, ~R3, 0, R5
                    ^   ^                 ^   ^

Where ~ denotes inversion. To extract the message, compare the random data with the stegotext and note the bit after each comparison failure.

4.5 Threshold Bits

This method is similar to the flag method. However, instead of taking the value of just one bit after the inversion flag, a threshold decision is made on the subsequent n random data bits where n is an odd number. Thus if n = 3, the message bit is '0' if only one or none of the 3 bits is a '1'. Two or three random bits set to '1' would indicate a message bit of value '1':

    R1, R2, R3, T1, T2, T3, R7, R8 ->
            ^   ^   ^   ^     R1, R2, ~R3, T1, T2, T3, R7, R8 
                                       ^   ^   ^   ^
    0:      T1, T2, T3 = 000, 100, 010 or 001
    1:      T1, T2, T3 = 110, 101, 011 or 111

4.6 Look-Up Table

In similar vein to the threshold scheme, the parity of, or any other n:1 bit function of a number of bits in the random data stream could determine the message bit.

However, as a generalisation, a table look-up system could be used to determine the message bit from a group of random data bits. If n bits were used, then there would be a 2ⁿ by one bit look-up table.

4.7 Dynamic Look-Up Table

Leading on from 4.6, the look-up table could be dynamically modified each time it was used. This would probably be accomplished by swapping the entry that had just been used with one chosen at random.

It should be noted that although bit inversion has been used to indicate the protocol event, bit insertion and bit deletion schemes can be used just as well.

4.8 Implied Dynamic Look-Up Table

As the receiver already has the table index, there is no reason why it should be transmitted, thus reducing bandwidth. In this case, the table is positioned after the inversion flag and, during transmission, the table bits are omitted:

    R1, R2, R3, T1, T2, T3, R7, R8 -> R1, R2, ~R3, R7, R8 
            ^   ^   ^   ^                      ^

where T1, T2 and T3 are used for the address of an eight entry dynamic look-up table that determines the message bit. For completeness, it should be noted that there could also be an Implied Look-Up Table scheme.

4.9 Protocol Control Channel

In these schemes there is a need for data other than message data to be transmitted. Examples might be messages to indicate that the size of look-up table or the encoding scheme is about to change.

This can be accomplished by setting certain parameters outside of their normal bounds. If, say, there is a minimum limit set for the number of random bits that are transmitted between protocol events, then breaking this rule can be used as a message in itself or indicate that a larger message is immanent.

4.10 Indirect Dynamic Look-up Table

So far in these schemes, either all or part of the random data has been transmitted. In the case where a pseudorandom generator is used, if sufficient data is captured, an evesdropper may be able to guess the structure and state of the generator. This can be made much more difficult by sending a series of numbers that represent the distances between protocol events.

This approach and the Implied Dynamic Look-up Table scheme reduce the amount of data transmitted and thus the capability of these schemes to defeat traffic analysis. In fact, it may be better to improve their efficiency and define them as a class of lossy cipher combiners.

4.11 Event Span

I am indebted to Bruce Christianson for this suggestion.

The span or distance between protocol events is used as an index into a dynamic look-up table:

    R1, R2, R3, R4, R5, R6 -> R1, R2, ~R3, R4, ~R5, R6
            ^       ^                  ^        ^

Here the indicated events are two bits apart and the number two can then be used with a dynamic look-up table to produce the next message bit.

Comments welcome: Keith

Back to home page _{cipher,crypto,cryptography,czczcz,cipher design,
encryption,steganography}