Friday, March 02, 2007

Data Encryption Techniques

Introduction

Often there has been a need to protect information from 'prying eyes'. In the electronic age, information that could otherwise benefit or educate a group or individual can also be used against such groups or individuals. Industrial espionage among highly competitive businesses often requires that extensive security measures be put into place. And, those who wish to exercise their personal freedom, outside of the oppressive nature of governments, may also wish to encrypt certain information to avoid suffering the penalties of going against the wishes of those who attempt to control.

Still, the methods of data encryption and decryption are relatively straightforward, and easily mastered. I[1] have been doing data encryption since my college days, when I used an encryption algorithm to store game programs and system information files on the university mini-computer, safe from 'prying eyes'. These were files that raised eyebrows amongst those who did not approve of such things, but were harmless [we were always careful NOT to run our games while people were trying to get work done on the machine]. I was occasionally asked what this "rather large file" contained, and I once demonstrated the program that accessed it, but you needed a password to get to 'certain files' nonetheless. And, some files needed a separate encryption program to decipher them.

Methods of Encrypting Data

Traditionally, several methods can be used to encrypt data streams, all of which can easily be implemented through software, but not so easily decrypted when either the original or its encrypted data stream are unavailable. (When both source and encrypted data are available, code-breaking becomes much simpler, though it is not necessarily easy). The best encryption methods have little effect on system performance, and may contain other benefits (such as data compression) built in. The well-known 'PKZIP®' utility offers both compression AND data encryption in this manner. Also DBMS packages have often included some kind of encryption scheme so that a standard 'file copy' cannot be used to read sensitive information that might otherwise require some kind of password to access. They also need 'high performance' methods to encode and decode the data.

Ways of encrypting data


1. The 'translation table', meets this need very well. Each 'chunk' of data (usually 1 byte) is used as an offset within a 'translation table', and the resulting 'translated' value from within the table is then written into the output stream. The encryption and decryption programs would each use a table that translates to and from the encrypted data. Further, such a method is relatively straightforward for code breakers to decipher - such code methods have been used for years, even before the advent of the computer. Still, for general "unreadability" of encoded data, without adverse effects on performance, the 'translation table' method lends itself well.

a. A modification to the 'translation table' uses 2 or more tables, based on the position of the bytes within the data stream, or on the data stream itself. Decoding becomes more complex, since you have to reverse the same process reliably. An example of this method might use translation table 'A' on all of the 'even' bytes, and translation table 'B' on all of the 'odd' bytes. Unless a potential code breaker knows that there are exactly 2 tables, even with both source and encrypted data available the deciphering process is relatively difficult.

b. Similar to using a translation table, 'data repositioning' lends itself to use by a computer, but takes considerably more time to accomplish. A buffer of data is read from the input, then the order of the bytes (or other 'chunk' size) is rearranged, and written 'out of order'. The decryption program then reads this back in, and puts them back 'in order'. Often such a method is best used in combination with one or more of the other encryption methods mentioned here, making it even more difficult for code breakers to determine how to decipher your encrypted data. The most common examples are anagrams. Some anagrams are easier than others to decipher, but a well written anagram is a brain teaser nonetheless, especially if it's intentionally misleading.

2. My favorite methods, however, involve something that only computers can do: word/byte rotation and XOR bit masking. If you rotate the words or bytes within a data stream, using multiple and variable direction and duration of rotation, in an easily reproducable pattern, you can quickly encode a stream of data with a method that is nearly impossible to break. In some cases, you may want to detect whether data has been tampered with, and encrypt some kind of 'checksum' into the data stream itself. This is useful not only for authorization codes but for programs themselves.

a. A virus that infects such a 'protected' program would no doubt neglect the encryption algorithm and authorization/checksum signature. A cyclic redundancy check is one typically used checksum method. It uses bit rotation and an XOR mask to generate a 16-bit or 32-bit value for a data stream, such that one missing bit or 2 interchanged bits are more or less guaranteed to cause a 'checksum error'. The method is somewhat well documented, and standard. But, a deviation from the standard CRC method might be useful for the purpose of detecting a problem in an encrypted data stream, or within a program file that checks itself for viruses.

3. Key-Based Encryption Algorithms:

a. One very important feature of a good encryption scheme is the ability to specify a 'key' or 'password' of some kind, and have the encryption method alter itself such that each 'key' or 'password' produces a different encrypted output, which requires a unique 'key' or 'password' to decrypt. This can either be a 'symmetrical' key (both encrypt and decrypt use the same key) or 'asymmetrical' (encrypt and decrypt keys are different). The popular 'PGP' public key encryption, and the 'RSA' encryption that it's based on, uses an 'asymmetrical' key. The encryption key, the 'public key', is significantly different from the decryption key, the 'private key', such that attempting to derive the private key from the public key involves many hours of computing time, making it impractical at best.

b. . In nearly all cases, if an operation is performed on 'a', resulting in 'b', you can perform an equivalent operation on 'b' to get 'a'.

c. In the case of the RSA encryption algorithm, it uses very large prime numbers to generate the public key and the private key. Although it would be possible to factor out the public key to get the private key (a trivial matter once the 2 prime factors are known), the numbers are so large as to make it very impractical to do so.

d. What PGP does (and most other RSA-based encryption schemes do) is encrypt a symmetrical key using the public key, then the remainder of the data is encrypted with a faster algorithm using the symmetrical key. The symmetrical itself key is randomly generated, so that the only way to get it would be by using the private key to decrypt the RSA-encrypted symmetrical key.

e. Example: Suppose you want to encrypt data (let's say this page) with a key of 12345. Using your public key, you RSA-encrypt the 12345, and put that at the front of the data stream (possibly followed by a marker or preceded by a data length to distinguish it from the rest of the data). THEN, you follow the 'encrypted key' data with the encrypted page text, encrypted using your favorite method and the key '12345'. Upon receipt, the decrypt program looks for (and finds) the encrypted key, uses the 'private key' to decrypt it, and gets back the '12345'. It then locates the beginning of the encrypted data stream, and applies the key '12345' to decrypt the data. The result: a very well protected data stream that is reliably and efficiently encrypted, transmitted, and decrypted. [2]

4. A brand new 'multi-phase' method (invented by ME)

a. I have (somewhat) recently developed and tested an encryption method that is (in my opinion) nearly uncrackable. The reasons why will be pretty obvious when you take a look at the method itself. I shall explain it in prose, primarily to avoid any chance of prosecution by those 'GUMMINT' authorities who think that they oughta be able to snoop on anyone they wish, having a 'back door' to any encryption scheme, etc. Well, if I make the METHOD public, they should have the same chance as ANYONE ELSE for decrypting things that use this method.

i. Using a set of numbers (let's say a 128-bit key, or 256-bit key if you use 64-bit integers), generate a repeatable but highly randomized pseudo-random number sequence (see below for an example of a pseudo-random number generator).

ii. 256 entries at a time, use the random number sequence to generate arrays of "cipher translation tables" as follows:

1. fill an array of integers with 256 random numbers (see below)

2. Sort the numbers using a method (like pointers) that lets you know the original position of the corresponding number

3. Using the original positions of the now-sorted integers, generate a table of randomly sorted numbers between 0 and 255. If you can't figure out how to make this work, you could give up now... but on a kinder note, I've supplied some source below to show how this might be done - generically, of course.

4. Now, generate a specific number of 256-byte tables. Let the random number generator continue "in sequence" for all of these tables, so that each table is different.

5. Next, use a "shotgun technique" to generate "de-crypt" cipher tables. Basically, if a maps to b, then b must map to a. So, b[a[n]] = n. get it? ('n' is a value between 0 and 255). Assign these values in a loop, with a set of 256-byte 'decrypt' tables that correspond to the 256-byte 'encrypt' tables you generated in the preceding step. NOTE: I first tried this on a P5 133Mhz machine, and it took 1 second to generate the 2 256x256 tables (128kb total). With this method, I inserted additional randomized 'table order', so that the order in which I created the 256-byte tables were part of a 2nd pseudo-random sequence, fed by 2 additional 16-bit keys.

6. Now that you have the translation tables, the basic cipher works like this: the previous byte's encrypted value is the index of the 256-byte translation table. Alternately, for improved encryption, you can use more than one byte, and either use a 'checksum' or a CRC algorithm to generate the index byte. You can then 'mod' it with the # of tables if you use less than 256 256-byte tables. Assuming the table is a 256x256 array, it would look like this:
crypto1 = a[crypto0][value] where 'crypto1' is the encrypted byte, and 'crypto0' is the previous byte's encrypted value (or a function of several previous values). Naturally, the 1st byte will need a "seed", which must be known. This may increase the total cipher size by an additional 8 bits if you use 256x256 tables. Or, you can use the key you generated the random list with, perhaps taking the CRC of it, or using it as a "lead in" encrypted byte stream. Incidentally, I have tested this method using 16 'preceding' bytes to generate the table index, starting with the 128-bit key as the initial seed of '16 previous bytes'. I was then able to encrypt about 100kbytes per second with this algorithm, after the initial time delay in creating the table.

7. On the decrypt, you do the same thing. Just make sure you use 'encrypted' values as your table index both times. Or, use 'decrypted' values if you'd rather. They must, of course, match.

5. However, if you're at a loss for a random sequence consider a FIBBONACCI sequence, using 2 DWORD's (like from your encryption key) as "seed" numbers, and possibly a 3rd DWORD as an 'XOR' mask. An algorithm for generating a random sequence of numbers, not necessarily connected with encrypting data, might look as follows:

  unsigned long dw1, dw2, dw3, dwMask;
int i1;
unsigned long aRandom[256]
dw1 = {seed #1};
dw2 = {seed #2};
dwMask = {seed #3};
// this gives you 3 32-bit "seeds", or 96 bits total
  for(i1=0; i1 < style="">
{
dw3 = (dw1 + dw2) ^ dwMask;
aRandom[i1] = dw3;
dw1 = dw2;
dw2 = dw3;
}

If you wanted to generate a list of random sequence numbers, let's say between zero and the total number of random numbers in the list, you could try something like THIS:
int __cdecl MySortProc(void *p1, void *p2)
{
  unsigned long **pp1 = (unsigned long **)p1;
  unsigned long **pp2 = (unsigned long **)p2;
  if(**pp1 < **pp2)
    return(-1);
  else if(**pp1 > *pp2)
    return(1);
   return(0);
}
...
  int i1;
  unsigned long *apRandom[256];
  unsigned long aRandom[256];  // same array as before, in this case
  int aResult[256];  // results go here
  for(i1=0; i1 <>
  {
    apRandom[i1] = aRandom + i1;
  }

// now sort it
  qsort(apRandom, 256, sizeof(*apRandom), MySortProc);
// final step - offsets for pointers are placed into output array
  for(i1=0; i1 <>
  {
    aResult[i1] = (int)(apRandom[i1] - aRandom);
  }
...

The result in 'aResult' should be a randomly sorted (but unique) array of integers with values between 0 and 255, inclusive. Such an array could be useful, for example, as a byte for byte translation table, one that could easily and reliably be reproduced based solely upon a short length key (in this case, the random number generator seed); however, in the spirit of the 'GUTLESS DISCLAIMER' (below), such a table could also have other uses, perhaps as a random character or object positioner for a game program, or as a letter scrambler for an anagram generator.

GUTLESS DISCLAIMER: The sample code above does not in and of itself constitute an encryption algorithm, or necessarily represent a component of one. It is provided solely for the purpose of explaining some of the more obscure concepts discussed in prose within this document. Any other use is neither proscribed nor encouraged by the author of this document, S.F.T. Inc., or any individual or organization that is even remotely connected with this web site.


[1] The author is a employee of Cisco, and the following extract is taken from his recollections.

2 comments:

Anonymous said...

its too large, data is good, but imprecise

Unknown said...

Very impressive detail. All the content covered in this post about data encryption standard is simply great and useful. Using it one can easily learn about it.
digital signature Microsoft