How does Foursquare fill its database

FOUR SQUARE CHIFFRE. Table of Contents. Michael Hauschild, Norman Brügmann. June 20, 2008

Transcript

1 FOUR SQUARE CHIFFRE Michael Hauschild, Norman Brügmann June 20, 2008 Table of contents 1 Basic information on cryptography What actually means cryptography What is a chire Basic procedure for chirping Simple substitution Homophonic substitution Polygraphic substitution Polyalphabetic substitution Transposition method Types and examples of Chiren Monoalphabetic Chiren The key Felix Marie Delastelle Preface to the person Biographical information about Delastelle Important Delastelle Chiren Bid Chire Trid Chire

2 3 The Four-Square-Chire General Encryption Technique: Structure Encryption Attack Disadvantages Cryptanalysis of an Encrypted Text with the Analyze-O-Matic Test Bases Results Conclusion Project Documentation Task Requirements Project Objectives (target-actual comparison) Timing Cost-effectiveness Project progress Conclusion Appendix Project diary Functional description of the Four- Square implementation Bibliography 20 1 Basic information on cryptography 1.1 What does cryptography actually mean? The basic task of cryptography is to encrypt and decrypt information with the aim of limiting access to the information content. Only the person or institution should know the content for which it is intended. However, cryptography is not a creation of modern science, as ancient inscriptions and records show. Highly complex algorithms have developed from the very simple processes of antiquity. We nd many of them in our everyday life without consciously noticing them, be it when we get money from the machine or log into our account. 2

3 1.2 What is a Chire secret script (cryptography) is a script that only the initiated can read. When encrypting, either specially agreed characters are used or the order of the characters is reversed. 1.3 Basic procedures for chirring Simple substitution Most classical chiren are based on the substitution of characters or character strings within the plain text with other letters and symbols, depending on the method, a distinction can primarily be made between monoalphabetic and polyalphabetic chiren Homophonic substitution A special and improved form of substitution is the homophonic substitution. Several secret characters are assigned to the individual plain text characters so that the encrypted text consists of significantly more characters than the original plain text. This method has the decisive advantage that recognizing the relative frequency of certain characters is made more difficult if several secret characters are assigned to it. The same frequency distribution of all characters can thus be achieved Polygraphic substitution In polygraphic substitution, or polygram substitution, instead of individual characters of the plain text, several are converted into new groups of ciphertext letters, in so-called polygrams Polyalphabetic substitution This is an extension of the monoalphabetic substitutions in which instead of just one ciphertext alphabet several are used to encrypt the plaintext. 1.4 Transposition procedure In cryptography, transposition is an encryption procedure in which the characters of a plain text are rearranged. The specific characteristics of the respective character are retained, only the position in which it is located is changed. There is no substitution of the character set here, only its reordering within the plain text. These methods are often used in Chiren where the plain text is displayed within a matrix. The encryption is done by simply swapping 3

4 of the rows with the columns (transposition). Freely selectable algorithms for re-sorting result in an almost unlimited number of possible variations if the plain text contains enough characters. 1.5 Types and Examples of Chiras Monoalphabetic Chiras The monoalphabetic Chiren are the simplest form of substitution cipher. The basic idea is that the letters of a plain text are replaced by other symbols using an allocation table. One of the best-known examples of monoalphabetic encryption is the so-called Cesar-Chire, in which the message is encrypted by moving a certain number of letters in the alphabet, so to speak shifting the alphabets against each other. Since each letter is always encoded with the same Chiren letter, the weakness of this method is based on the probability of the letters appearing. If you count the frequency of the letters in the ciphertext, you get the same distribution as in the plain text. It is then easy to assign a plaintext letter to each Chire letter. Figure 1: Alphabet shift of the Ceasar-Chire (a) Source: Own polyalphabetic chiras The polyalphabetic chiras have the property, compared to the monoalphabetic ones, that they use more than one assignment table in alternation, that is, several secret characters are assigned to each plaintext character. The best known example of this chire is probably the Vigenere Chire. 4th

5 Figure 2: Vigen square with example (a) Source: Wikipedia 1.6 The key We have explained how the basic procedures work, but we ignored how exactly the texts are now encrypted and, above all, how they can be decrypted again. This is where the concept of the key, also known as the password, comes into play. The content of the key contains information on how the plain text was encrypted with the respective chi re and, more importantly, how the ciphertext can be made readable again. In order to achieve good encryption and thus a high level of security for its information, the key size used is of decisive importance; the more complex the key, the stronger the encryption factor for most chi ren, which increases exponentially with increasing key length. However, a large key also leads to problems, since it becomes more difficult to remember with increasing complexity (especially if you also have to memorize the code tables). In the simplest case, it consists of an easy-to-remember password. For today's demands, the security of such keys is no longer sufficient, as they are usually too short and can be found by simply trying the colloquially called bruteforcing. 5

6 Figure 3: Schatic representation of a key (a) Source: hp.kairaven.de 2 Felix Marie Delastelle 2.1 Preface to the person Felix Marie Delastelle was a French cryptographer. He became known for the development of various polyalphabetic substitutions and transpositions Chiren between the end of the 19th century. Despite his extensive influence and the large number of works in the field of cryptography, only sparse information can be found about him, some of which has been incompletely documented. Therefore, in this section we will primarily turn to his work in the field of cryptography. 2.2 Biographical information about Delastelle Felix Marie Delastelle was born in France in 1840 and was the son of a sailor who, as Delastelle, was four years old, was lost on the high seas and was therefore probably the victim of a skiing accident. He attended school in Saint Malo, a small coastal town in the Bretange in northwestern France. There he also spent most of his life until his retirement in 1900 as an employee of a warehouse in the port of the city. He then rented a small room in a holiday hotel and wrote his treatise on his work, which was contained in the 150-page work: Delastelle, F. Traité Élémentaire de Cryptographie. Paris: Gauthier-Villars, which appeared in 1902. Sixieme Partie-Appareils Cryptography. In this book, 6

7 he various cryptographic processes, in particular his self-developed Four-Square-Chire, which is an extension of the Playfair-Chire named after Lord Lyon and one of the polygraphic chirs. By adding additional squares, Delastelle was able to increase the strength of this chire considerably. The exact details of the implementation are explained in more detail in section 4. Delastelle died shortly before his book was published in This book, however, was not his only publication. As early as 1895 he published a treatise by Bid-Chire in the French newspaper Revue du Génie civil under the title Cryptographie nouvelle. The Bid-Chire is based on the Polybius square, a chire that was already used in antiquity. This combined Delastelle with transposition and fractionation processes, whereby each plain text symbol is converted into a certain foreign text symbol. 2.3 Important Delastelle Chiren Bid Chire The Bid-Chire, developed by Delastelle around 1901, is, like the Four Square-Chire, an extension of an already existing antique Chire, the so-called Polybios Square. The Polybius square forms the basis for this chire. Inside the square is the actual alphabet, in which the letter J is replaced by the I, the individual letters of which can be clearly identified using a vertical and horizontal index. This makes it possible to encode a letter with two numbers. Furthermore, the letters can be arranged in any combination in a square. Figure 4: Polybio's square (a) Source: Wikipedia 7

8 Since this chire is basically a monoalphabetic chire, the respective characteristics of the language are not encoded, which means that it can be easily broken using basic cryptanalytic methods so that the characteristics of the language are also encoded, Delastelle has the chire extended by a transposition method and additional replacement to a diagraph chire. The procedure is as follows: 1. Any Polybios square is generated and a message is encrypted, the number combinations for the respective letters are arranged side by side in columns. Figure 5: Coding (a) Source: Wikipedia 2. After the coded message has been generated, the columns are transposed into a single line. With this the next step of the encryption can now take place Figure 6: Transposition (a) Source: Wikipedia 3. The numbers are now broken down one after the other into individual pairs and re-encrypted into letters using the Polybios square. Figure 7: Reverse encryption (a) Source: Wikipedia With this method, the chire has been significantly strengthened. This has the following causes: we encrypted our plaintext with two additional steps and thus created a diusion in the characteristics of the language. It's 8

9 it is no longer possible to use the coincidence index of the foreign text to reliably identify the language that is hidden in the plain text. The attack on the Chire is similar to that of the Playfairchire Trid Chire The Trid Chire is an extension of the Bid Chire by two additional squares, but the underlying encryption mechanism has remained unchanged. The difference is that the alphabet is divided into three individual 3x3 squares with a total of 27 fields. Compared to Bid, this enables the entire alphabet to be mapped. Each field has the same horizontal and vertical indices. So that the letters can still be clearly identified, each square also has its own index. Each letter is coded by exactly three numbers. The encryption is carried out using the same method as with the Bid Chire. The attack is similar to the Bid Chire, but due to the additional squares, it is associated with significantly more effort. Figure 8: Trid encryption method (a) source: Wikipedia Figure 9: Encoding Trid (a) source: Wikipedia 9

10 Figure 10: Transposition Trid (a) Source: Wikipedia 3 The Four-Square Chire 3.1 General The Four-Square Chire is a polyalphabetic digraph chire with symmetrical encryption technology, which is based on the fundamentals of the Playfair Chire, but has been expanded to include additional methods . It was developed by the French cryptologist Felix Marie Delastelle (), along with other well-known chirs, such as the bid, trid or two-square chire. Like the other Chiren, the Four-Square-Chire is described in the book Traité élémentaire de cryptologie, which was published in 1902 by Gauthier-Villars. The name comes from the structure of the chire. It consists of four (four) squares. 3.2 Encryption technology: In the case of the Chire, individual pairs of letters (bigrams) are exchanged, which makes it a polygraphic substitution chire. As a result, the Chire is more difficult to crack by frequency analysis than monographic Chiren, since here 676 instead of 26 possibilities have to be tested. Cracking them is relatively easy these days, but the prerequisite for this is that there is enough text that can be analyzed for specifics for this chire. Another aspect of the Chire is that 2 keys are used, so the Chire is less susceptible to the analysis of upside-down digraphs, as is the case with the Playfair Chire. However, this eect is canceled again if the same key is selected. 3.3 Structure The structure of the chire is relatively simple, but sometimes very (time) consuming, especially when working with paper, as was common at the time. It consists of 4 squares (matrices), each consisting of 5x5 (a total of 25) fields. These fields are then filled with the letters of the alphabet. There are various ways of adapting the alphabet to the 5x5 matrix. You either leave out less common letters, such as the Q, or combine the J and the I or, as a third option, the W is replaced by VV (technique from letterpress printing). Two of the four squares, top left and bottom right, are assigned a plain text alphabet, i.e. an unchanged alphabet (taking into account the adaptation to the 5x5 matrix) from the top left, line by line to 10

11 bottom right. The other two matrices, top right and bottom left, are the so-called encrypted squares. A key and / or an encryption technique is set on each of them. In the case of the key, a keyword, which only the sender and recipient should know, is written at the beginning. In doing so, missing double letters of the keyword and umlauts are rewritten (e.g. ü becomes ue). The rest of the missing letters are simply filled in to complete the square. A distinction is made between 2 methods of encryption technology. On the one hand the normal unchanged encryption, in which the letters are filled alphabetically from left to right and from top to bottom, and on the other hand in a spiral, where the letters are filled up like in a spiral from the top left to the middle. In our example are plain text square and Encrypted square with lower case (plain text) and upper case (encrypted) and separated by color for better visualization. Figure 11: Four-Square Squares (a) Source: Own 3.4 Encryption With the encryption itself, the plaintext is first broken down into individual bigrams (digraphs), i.e. into individual pairs of letters. If a letter is missing in the last bigram, it is simply replaced by one that makes less sense, here for example the Y. So hello becomes ha ll oy. Then each individual diagraph is viewed and encrypted separately. The first letter of the letter pair is found in the square at the top left and marked accordingly. The second letter is then identified and marked in the square at the bottom right. The marked fields are then connected to a large square using the encrypted matrices. 11

12 Figure 12: Coding technique (a) Source: Own The resulting new corner points then form the encrypted bigrams, with the encryption of the respective plain text letter in the same line as this. Figure 14: Coding (a) Source: Own These steps are repeated for the individual pairs of letters. Existing spaces are then removed and the encrypted text is obtained. In our example, hello becomes LHHYJD. 12

13 Figure 13: The square in the squares (a) Source: Own Figure 15: Coded content (a) Source: Own The image also shows how the Chire works as an algorithm. The decryption works in the opposite way, but according to the same principle. 3.5 Attack If there is enough text, the four-square chire can be cracked easily. If both plain text and encrypted text are known, the key can be easily obtained. If only the encrypted text is available, the brute force cryptanalysis can be used to carry out a frequency analysis of the bigrams and compare it with the typical language frequency of the bigrams of the respective language in order to get to the plain text and thus also to the key. 3.6 Disadvantages The Four-Sqare Chire is stronger than the Playfair-Chire, but it is also more cumbersome because it uses 2 keys that have to be transmitted to the addressee of the ciphertext. Furthermore, the preparation is 13

14 relatively time-consuming for encryption and decryption, since all of the many squares must be prepared first (working with paper).In addition, as already mentioned, cracking the chire is relatively easy if there is enough text. Since the Four-Sqaure Chire was only slightly stronger than the Playfaiy Chire, but more complex, it became less well-known and its direct competitor came to the fore. 3.7 Cryptanalysis of an encrypted text with the analysis O-Matik test fundamentals Original text: Excerpt from the Wikipedia article about JW v.goethe Keys: Super, Duper Special feature in the Polybios square: Omission of Q Analysis tool: Results Figure 16: Overview of the analysis (a ) Source: Own As you can see here very well, the typical distribution of the letters of the respective language is greatly changed. This can be seen from the coincidence, which for a German text is approx. 0.0762 (here 0.0782). However, the encrypted text has a coincidence of 0.0603. So in the 14th

At the first moment I couldn't clearly see that this is a German text. When it comes to the distribution of letters, however, some clear similarities can be seen, for example the uniqueness of the e and the rough overall distribution of the other letters. However, the difference between t and u clearly stands out. However, it must be said that this distribution is only an example and the distribution of the letters depends very much on the keys. Figure 17: Distribution of letters (a) Source: Own In the bigram analysis, however, the real weakness of the Chire becomes clear. Figure 18: Bigrams (a) Source: Own It can be clearly seen that the distribution of the bigrams is roughly the same in both texts, so if enough text is available, you can very easily access the encrypted squares, which are also the each key included. There can be a few differences here, as with the Four-Square Chire only 2 letters are encrypted. So it can happen that the individual letters of a plaintext bigram are encrypted differently. (Example: it encrypts very often in the German alphabet, but here ve and rs are encrypted separately and thus result in completely different combinations). However, such inconsistencies can easily be confirmed by, for example, testing (does it make sense? Brute-force method). In conclusion, one can say that when analyzing a four-square-encrypted text in the 15th

16 As a rule, it is not possible to infer its origin directly and there may be strange similarities in the distribution of letters, which, however, are not very useful. However, if you look a little further, you can clearly see the weak point of the chire (condition: enough text). The bigrams. The conclusion that can be drawn from this is that the Four-Square-Chire can easily be analyzed if there is enough text. 3.8 Conclusion The Four-Square Chire is a relatively strong Chire for the premodern conditions at the time, which impresses with advantages over monographic chirs and their competitor at the time, the Playfair Chire. In spite of all this, it was not able to assert itself strongly enough because it was relatively too time-consuming. Thus, there were only isolated known uses. For today's conditions, it would be easily decipherable even with enough text. 4 Project documentation 4.1 Task The aim was to create a project on the subject of cryptographic procedures, especially the Four Square Chire developed by Felix Marie Delastelle. Background information on the life and work of Delastelle was to be gathered, as well as a detailed description of the Four Square Chire. Furthermore, an implementation of the Chire for the website should be created, a website that deals specifically with the topic of cryptography. The following requirements were made for this. It should be possible to encrypt and decrypt texts using a password. This had to be done through an easy-to-use interface. At the end of the project work, the entire work must be presented in a lecture. 4.2 Requirements The project work is to be created with the Lyx software, as this has auto format functions and thus corresponds to a scientific standard. This must be given to the project manager before the work is presented. The implementation is to be created in PHP and integrated into the framework of the cryptography playground. When using a database, the use of SQLlite is mandatory. The software tools for the implementation could be chosen freely. The presentation must be carried out in digital form. In addition, the individual progress and work steps should be noted in a project diary so that the project management is informed about existing problems. 16

17 4.3 Project goals (target / actual comparison) The documentation should describe the origins and use of the chirs. This turned out to be problematic in some cases. The origin of the chire was well documented in numerous sources, so that this goal was achieved. On the other hand, their historical use is difficult to determine in practice. Examples of their active use such as as military chires could not be found. Function and working methods of the chirs could be described in detail with the available sources. The biggest problems, however, were with the search for biographical information on the person Delastelle. The information we found comes from an English Wikipedia article which did not have a list of sources. In addition, all other information we found also referred to this Wikipedia article. The publication on Delastelle's work could be found in the British Library. It was ordered, but it would have had to be digitized at great expense, so we decided not to use it, especially since it first had to be translated from French into German. In order to be able to provide sufficient material about Delaselle's work, we have decided to briefly discuss the Bid and Trid Chire. The implementation of the Chire was successful, but was carried out with the help of third parties, since we only came into contact with PHP through this project and only started here to implement a larger project in terms of programming. 4.4 Time planning A time period of 8 weeks was set for the project. Our more detailed time planning and our respective progress are noted in our project diary. 4.5 Economic efficiency For a better economic division, we have divided our areas to be processed. One of them focused more on implementation and the other on research. The general procedure and summary of the work were carried out together for better coordination. 4.6 Project progress At the beginning of the project, research was carried out on the Internet in order to obtain an overview of the topic. The implementation of the Four Square Chire was then carried out and tested with the material found. Then we started to write the thesis. This was gradually expanded and repaired. Finally, we implemented the framework for the cryptography playground and created an associated overview. 17th

18 4.7 Conclusion The basic goals that we have set for ourselves have been achieved. With the exception of the background to the person of Delastelle and an implemented attack on the Chire. No rich sources could be found for this. Nevertheless, we are satisfied with the results. 5 Appendix 5.1 Project diary May 10th Implementation of the Chire May research work May 14th Order the Delastell publication from the British Library May 15th Response from the British Library, order canceled May 30th Beginning of the technical work 7. Creation of the project documentation June 16th Adjustment of the implementation to the framework June 20th Submission of the thesis June 24th Presentation of the results 5.2 Functional description of the Four Square implementation class main The class main contains the complete implementation of the Chire and all the necessary functions to enable the encryption. The following instance variables were dened: Instance variables private $ ab1 = array (); private $ ab2 = array (); private $ ab3 = array (); private $ ab4 = array (); The variables ab1-ab4 store the individual squares of the chire, by default they contain an alphabet with 25 letters, they get their values ​​from the variable character set. The arrays are two-dimensional. 18th

19 private $ input = ""; The input variable saves the text entered via the browser window private $ character set = array (); The variable character set contains the actual alphabet. Functions function main () The alphabet is generated in the main function (constructor). function output (array) Outputs the array in which the squares were saved function setkey1exchange (key) function setkey2exchange (key) After the keys for the chire have been specified, they must be inserted into the squares ab2 and ab3. First, a temporary variable is created in which the contents of the passwords are saved in a separate array using the split () function, then the array_unique () function removes all letters that occur multiple times in the password so that it can be inserted into the squares. This guarantees the constant size of the squares after the password has been inserted. However, this function does not insert the key as described by Delastelle. Instead of inserting it at the beginning of the alphabet, it is mixed in here. function setkey1 insert (key) function setkey2 insert (key) These functions insert the key by inserting it at the beginning of the alphabet. Before this is done, it is checked whether the number of letters in the alphabet is a square. For this purpose, the root of the number of letters was determined and checked for its module. Now the key has to be inserted into the alphabet. The functions array_unique (), array_merge () and split () are required for this. Split () writes the key in a separate array, this is passed along with the two squares to array_merge () and merged into an array. Array_uniqe () removes at the end the double letters that were created by inserting the key, so that the original size of the array is restored. function getcrypt () 19

20 This function returns the encrypted text. function getencrypt () This function returns the decrypted text. function set input (string) For the chire it is necessary that the text to be encrypted contains an even number of characters. The function checks this property and adds a y to the text if the number is odd. function getxy ($ char, $ array) This function was used to test the implementation and should give an overview of the indexes of the arrays. function set charset ($ charset) This function fills the arrays with the alphabet specified in the constructor. function getab1 () function getab2 () function getab3 () function getab4 () These functions output the contents of the respective squares. 6 List of sources

21 List of figures 1 Alphabet shift of the Ceasar-Chire Vigener square with an example Schatic representation of a key Polybios square Coding transposition Reverse encryption Trid encryption method Coding Trid transposition Trid Four-square squares Coding technique Coding The square in the squares Coded content Overview of the analysis Letter distribution Bigram