Some Thoughts on the Design of a Proposed Stream Cipher

New Message Reply About this list Date view Thread view Subject view Author view

Mok-Kong Shen (mok-kong.shen@stud.uni-muenchen.de)
Mon, 18 Jan 1999 17:07:30 +0100


PRELUDE. This is a major re-work of an earlier article of the author
entitled 'On the Generation of Pseudo-OTP'. Its title has been changed
because of possible misleading interpretations/connotations of the
term 'pseudo-OTP'.

I. INTRODUCTION.

While the general tendency of modern developments in cryptography has
been towards block ciphers of increasingly larger key length, with
e.g. AES demanding a minimum of 128 bits, a recent agreement of the
Wassenaar countries to restrict export of crypto products to those
having key length not exceeding 56 bits will invariably lead to a
more or less long-durated very undesirable situation where innocent
and law-obeying people outside of the 33 signatory nations will have
only relatively weak ciphers at their disposal that are insufficient
to offer an adequate level of security of their private confidential
communications (either with themselves or with people of the 33
countries). For eventually illegally imported strong crypto products
could have problems of general availability, cost, maintenance,
integrity (virus/bugs implanted by malicious agencies) and, last but
not least, legal issues. On the other hand, their countries are on the
average fairly poor in scientific and technical know-how currently, so
that it would be extremely difficult to bring up there in short terms
an adequate crypto infrastructure comparable to that of the
technically superior export-regulating nations.

One viable way for people in the said technically underdeveloped
region to escape from the dilemma appears to be to resort to
superencipherment, i.e. an encryption system consisting of a
combination of an easily implementable algorithm with a sufficiently
good 56-bit algorithm (which can be legally obtained from the 33
countries) with the hope that the resulting strength will exceed the
computational capability of the mighty eavesdroppers.

It is the purpose of this paper to present some preliminary ideas on
the realization of such an easily implementable complementary
algorithm in the form of a stream cipher having the property that its
keystream can be easily generated by both partners of a given
communication connection without involving any problems of transport
and secure storage, thus possessing the advantages of immunity to
(whatever) export controls and of simplicity of handling by
(eventually unsophisticated) users of the encryption scheme. (This is
to be contrasted, in particular, with keystreams obtained from
physical random processes, which normally incur severe management
problems of transport and secure storage.)

II. RAW MATERIALS FOR OBTAINING KEYSTREAMS.

For good cryptological security of a stream cipher, the keystream
generated should be hard to infer, i.e. it should be difficult for the
analyst to predict any bit of the keystream from the other bits. As is
well-known, this very crucial problem is unfortunately extremely
difficult to deal with. We regret (as the reader may have foreseen
from our introduction) not being able to offer any satisfactory
solution to that in the present paper. Instead we suggest to appeal to
the admittedly crude (imprecise) heuristic argument that by rendering
each bit of the keysteam somehow to be a sufficiently complex
(complicated) function of a sufficiently large set of independent and
sufficiently random values (variables) the dependency of any bit from
the other bits of the entire sequence obtained can be made to be
arbitrarily small so as to be beyond the inference capability of the
analyst.

Assuming that the above heuristic argument can be accepted and noting
that the dependency can be expressed in terms of auto-correlations, we
suggest using the following coarse plan to generate keystreams for
our purpose: Start with some more or less random raw materials
(sequences of bits/bytes) that are easily available albeit inherently
poor (i.e. comparatively strongly auto-correlated) and apply a
sequence of suitably chosen refining processes (to be treated in the
next section) aimed to reduce the correlations such that the end-stage
output will be of satisfactory quality (progressive refinement).

Conceptually this is akin to obtaining potable water from a muddy
source through employing a series of differeent filtering and
disinfecting processes, with each augmenting the functionality of the
others such that the end product will hopefully be sufficiently free
of ingredients that are detrimental to human health.

Thus said, we shall presently look for ensembles of fluctuating
quantities that can be easily and perfectly leagally acquired by both
communication partners in identical copies independent of geographical
locations and that are apparently in some sense random, i.e. roughly
speaking non-systematic, disordered and without a discernable
periodicity. The following materials suggest themselves:

1. Natural language texts (books, news, public documents, etc.)

2. Artificial language texts (source programs, digit sequences of
   transcendental numbers, postcript files, etc.)

3. Sequences of data (physical measurements, economic time series,
   telephone numbers, results of numerical computations (with
   identical input, algorithms and computational accuracy), scientific
   data bases, etc.)

4. Other digitalized materials (voice, sound, photos, pictures,
   graphics, satellite and medical images, binary program files, etc.)

With the possible exception of the digit sequences of transcendental
numbers, these items are mostly fairly strongly auto-correlated. In
fact, some well-known methods of analysis are based on the frequency
distributions of bigrams, trigrams, etc. of the natural languages in
which the plaintext messages are written. In the next section we
shall therefore deal with a number of techniques aimed to efficiently
reduce the auto-correlations, hopefully leading to keystreams that are
of satisfactory quality for our purpose. (The envisaged, though
certainly not fully attainable, goal is of course the white noise,
whose auto-correlation function is defined to have the value zero
everywhere excepting at a singular point.)

III. TECHNIQUES FOR REDUCING AUTO-CORRELATIONS.

For simplicity of presentation we shall confine ourselves in this
section to natural language texts (sequences of characters in 8-bit
bytes) as raw materials for obtaining keystreams, since these are
very easily available and since the other sources can be treated in
essentially the same manner. The following techniques, being largely
taken from classical elementary cryptography and readily implementable
by persons of modest programming proficiency, appear to be promising
candidates for reducing the auto-correlations that are present in the
texts:

1. Apply substitutions, e.g. polyalphabetic substitutions of bytes.

2. Apply permutations, i.e. transpositions, including reversal.

3. Apply homophones, utilizing the fact that the alphabet of the
   texts is smaller than the full 256 character set.

4. Combining two text streams through:

   a. Bitwise XOR.

   b. Addition mod 2^32 (assuming 32 bit word length).

   c. Substitution, e.g. mapping a pair of bytes (or halfbytes), one
      from each stream, to a pair of bytes (or halfbytes).

5. Compression through:

   a. Commonly available data compression algorithms.

   b. XORing groups of bits, e.g. 32 bits, i.e. taking their parity.

6. Use every second or third character of the texts.

7. Apply other simple encryption techniques, e.g. auto-keying
   applied to a group of words and circular shift of bits in a word.

Note that with (4) an arbitrary number of text streams can be combined
and that all techniques can be applied recursively, for example (4b)
can be applied after (5b). Thus the proposed scheme for the
generation of keystreams can apparently indeed be made to be
arbitrarily complex, satisfying the requirement mentioned in our
heuristic argument of Section II. Of course, at this point one
naturally tends to recall the efficiency issue of processing which is
up till now among the essential selection criteria of modern
encryption algorithms. However, as the author recently pointed out
[1,2], one practical and simple way of coping with the 56-bit limitation
is to make use of a new paradigm, namely 'security through
inefficiency', i.e. paying the price of some tolerable longer
precessing time. This can certainly be justified in a substantial part
of civilian applications, e.g. encrypting e-mails, and can evidently
be applicable even to certain relatively time-critical encryptions as
well, if multiple hardware is available for processing parallel jobs.

IV. MISCELLANEOUS REMARKS.

The 'key', i.e. the secret information to be kept by both
communictaion partners, consists of the names or ISBNs of the texts
being used and the arbitraily chosen initial offsets from the
beginning of the texts. The keystream is preferably to be generated
online rather than generated once in great length and used for a
number of subsequent sessions, since otherwise it would have to be
securely stored, resulting in substantial management problems. This
means that the user keeps a running record of the offsets to the
individual texts, i.e. the character positions of the texts after the
previous message has been processed. (As a variation one could
introduce some amounts of skips after each message processed.) No
other informations need be securely stored. (The texts can in fact be
freely displayed and read by the user for entertainment when he is not
doing encryption.)

Since one's library may contain quite a large number of texts
(possibly also of several different languages which is advantageous
from the point of view of combination), each having the potential of
being combined together to generate the keystreams for encryption,
the mere knowledge of the content of the library of the user is
virtually useless to the analyst. Simple combinatorical considerations
indicate that an almost unlimited number of keysteams (or equivatently
a non-repeating (non-periodic) keystream of practically infinite
length) can be generated from a library of non-trivial size. This is
to be contrasted with keystreams obtained from one or a few pseudo-
random number generators (LFSRs etc. see [3]) that normally have
fairly limited period lengths. (For a compound PRNG designed to accept
an arbitrarily long seed and produce pseudo-random number sequences
of arbitrarily long period lengths, see [4].) In other words, on the
assumption that the user commits no procedural errors, the keystream
obtained is of 'one-time' nature.

Note: The above said 'one-time' qualification should not be
mis-understood by the reader to induce conceptual associations with
the one-time pad (OTP). Our proposed scheme is simply a stream cipher.
OTP is on the other hand an ideal and purely theoretical concept that
is never fully realizable in practice, though it is is commonly
believed that OTP can be approximated to satisfactory degree through
adequately post-processing bit sequences obtained from some physical
random processes. (Our comparatively much less ambitious goal, though,
is to be able to approximate the white noise well.)

Maurer's universal statistical test [5], being fairly easily
implementable, appears to be particularly recommendable for testing the
quality of the keystreams obtained with a given design configuration.

More than one offsets can be applied to the same text. This may be
useful in the special case where only a single text is availble. (As
a matter of interest, we mention that the same technique can be
applied to a single record of hardware generated keystream, enabling
through combination (with e.g. XOR) of the subquences thus obtained
the derivation of a number of different keystreams from the same
record.)

It is not necessary to maintain a large library at the user's place.
Certain texts may be downloaded from publically accessible servers,
if care is taken not to leak thereby sufficient amount of exploitable
informations to the analyst as to which texts actually participate in
the keystream generation process.

Pi appears to have an advantage over the other transcendental numbers
in the present context in that one can compute its digits starting
from arbitrarily given offsets, thus dispensing with the need to store
its digit sequence (albeit at the price of higher computing cost).

Text file compression software are easily obtainable and not subject
to export regulations. Self-implementation of these may however incur
some non-trivial programming expenditure. Note that such compressions
are reversible (through the corresponding decompression processes).
For multi-media applications, a large number of methods, some
involving sophisticated mathematical transformations, are available
that are irreversible but ensure reproduction of acceptable visual
quality (see [6,7]). It should be noted however that in our case we
need to care neither reversibility nor satisfactory reproduction.
(Likewise we are not concerned at all with error detection/correction
of codes.) Hence the extremely simple compression technique of (5b) of
Section III appears to be particularly valuable for our purpose. (It
may be mentioned that (5b) is utilized in author's humble design
WEAK4-EX [8], a stream cipher based on psuedo-random number generation
with a 56-bit key as seed.)

We take this opportunity to raise a question that could be of some
indirect relevance to our topic: Suppose there are three character
strings as follow:

1. Purchase 500 million Yens.
2. Sell all Microsoft shares.
3. Cut UK offer by 7 procent.

Suppose string 1 is the proper message to be communicated. Using (4a)
of Section III we obtain a keystream from strings 2 and 3 and XOR it
with string 1 to produce the ciphertext, i.e. the result of XORing
all three strings together. Suppose now that the analyst has a method
to systematically obtain all meaningful possible plaintexts
corresponding to the given ciphertext. Then all three character
strings above are certainly among the set of candidate plaintexts
found. How can he know which is the true message? This ambiguity seems
to imply that it may not be unconditionally necessary to strive to
achieve a very excellent approximation to the white noise but that
certain comparatively inferior quality could be sufficient. The author
is ignorant of an answer to this seemingly paradoxical issue and
should very much appreciate hints and explanations from the experts.

V. CONCLUSION.

As our title clearly indicates, this paper does not aim to present any
finalized design but simply sketches some apparently promising
techniques of realizing such. Based on our more or less fuzzy
heuristic argument (as given in Section II) alone it is evidently
entirely futile to predict which of the techniques (listed in Section
III) are efficient and which (eventually recursive) combination out
of them will actually lead to an optimal and practically useful stream
cipher. Only actual experiments will be able to tell this. However,
it is the author's intuitive conviction (partly based on some test
results of his WEAK-series of algorithms [1,8,9] that utilize part of
the techniques listed in this paper) that the direction we have been
persuing is a rewarding one worthy of serious further investigations.

The author wishes to thank CWL for reading a draft of the present
paper.

Comments, critiques and suggestions for improvement are sincerely
solicited.

M. K. Shen

LITERATURE.

[1] http://www.stud.uni-muenchen.de/~mok-kong.shen/#paper13

[2] http://www.stud.uni-muenchen.de/~mok-kong.shen/#paper17

[3] A. J. Menezes et al., Handbook of Cryptography. CRC Press, 1997.

[4] http://www.stud.uni-muenchen.de/~mok-kong.shen/#paper9

[5] U. M. Maurer, A Universal Statistical Test for Random Bit
    Generators. J. Cryptology (1992) 5:89-105.

[6] W. Kou, Digital Image Compression. Kluwer, 1995.

[7] V. Bhaskaran, K. Konstantinides, Image and Video Compression
    Standards. Kluwer, 1995.

[8] http://www.stud.uni-muenchen.de/~mok-kong.shen/#paper16

[9] http://www.stud.uni-muenchen.de/~mok-kong.shen/#paper15


New Message Reply About this list Date view Thread view Subject view Author view

 
All trademarks and copyrights are the property of their respective owners.

Other Directory Sites: SeekWonder | Directory Owners Forum

The following archive was created by hippie-mail 7.98617-22 on Sat Apr 10 1999 - 01:18:04