Squashed commit of the following:

commit 415a229 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 23:11:23 2025 +0200 Migrated Reverse Engineering from the Cyberclopaedia commit d60e4d4 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 23:04:43 2025 +0200 Migrated Post Exploitation from the Cyberclopaedia commit 4472858 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 22:59:06 2025 +0200 Migrated Exploitation from the Cyberclopaedia commit d379711 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 22:47:02 2025 +0200 Migrated System Internals from the Cyberclopaedia commit 743d61d Author: MihailKovachev <[email protected]> Date: Sun Jan 19 22:35:26 2025 +0200 Migrated Reconnaissance from the Cyberclopaedia commit e927910 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 22:28:41 2025 +0200 Migrated Cryptography content from the Cyberclopaedia commit 4314c40 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 21:06:51 2025 +0200 Migrated Hardware Hacking from Cyberclopaedia commit ec4ae25 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 21:01:29 2025 +0200 Migrated Networking from Cyberclopaedia commit 49327b9 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 19:42:14 2025 +0200 Backup. Preparation for content merge commit 7a7e670 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 16:16:30 2025 +0200 Probability theory basics commit e970fe0 Author: MihailKovachev <[email protected]> Date: Sun Jan 19 16:16:04 2025 +0200 Coordinate-independent definition of gradient commit 6bc1709 Author: MihailKovachev <[email protected]> Date: Sat Jan 18 22:38:11 2025 +0200 Method of Lagrange multipliers commit 6dd1611 Author: MihailKovachev <[email protected]> Date: Fri Jan 17 19:07:27 2025 +0200 Basics of combinatorics commit 9fd0f2a Author: MihailKovachev <[email protected]> Date: Thu Jan 16 17:08:03 2025 +0200 Chess rules
MihailKovachev · Jan 19, 2025 · 807c5fa · 807c5fa
1 parent b15eec2
commit 807c5fa
Show file tree

Hide file tree

Showing 758 changed files with 12,979 additions and 78 deletions.
diff --git a/vault/Chess/Test.md b/vault/Chess/Test.md
diff --git a/vault/Computer Science/Cryptography/Breaking Classical Cryptrography.md b/vault/Computer Science/Cryptography/Breaking Classical Cryptrography.md
@@ -0,0 +1,79 @@
+# The Shift Cipher
+
+One of the oldest known ciphers is known as Caesar's cipher. Julius Caesar encrypted his messages by shifting every letter of the alphabet three spaces forward and looping back when the end of the alphabet is reached. Consequently, `A` would be mapped to `D` and `Z` would be mapped to `C`.
+
+An immediate problem with this cipher is the lack of a key - the shift amount is always the same. A natural extension of the cipher would then be to let the shift amount vary, turning it into a key whose possible values are the numbers between 0 and 25. Therefore, the key space is  $K \equiv \{ 0, ..., 25 \}$.
+
+An encryption algorithm $Enc_k$ would take a plaintext $m$, shift its letters forwards by $k$ positions and spit out a ciphertext $c$. In contrast, a decryption algorithm $Dec_k$ would take the ciphertext $c$ and shift its letters *backwards* by $k$ places to retrieve the original plaintext. If we map the alphabet to the set ${0,...,25}$ ($a = 0, b = 1$, etc.), a more mathematical description is obtained. Encryption of any message $m = m_1 \cdot\cdot\cdot m_l$ ($m_i \in {0,...,25}$) using the key $k$ is given by
+
+ $$
+ \operatorname{Enc}_k (m_1 \cdot\cdot\cdot m_l) = c_1 \cdot\cdot\cdot c_l, \hspace{1cm} where \hspace{1mm} c_i = [(m_i + k) \mod 26]
+ $$
+
+The notation $[a \mod N]$ is the remainder of $a$ upon division by $N$ where $0\leq[a \mod N] < N$ and $\cdot\cdot\cdot$ denotes concatenation and not multiplication. Decryption of a cyphertext $c = c_1 \cdot\cdot\cdot c_l$ using a key $k$ would then be given by 
+
+ $$
+ \operatorname{Dec}_k (c_1 \cdot\cdot\cdot c_l) = m_1 \cdot\cdot\cdot m_l, \hspace{1cm} where \hspace{1mm} m_i = [(c_i - k) \mod 26]
+ $$
+
+It is only natural to now ask, is this cipher secure? And the simple answer is no. There are only 26 possible keys and so the key-space is not sufficiently big. You can even go through all 26 possible keys with a given ciphertext by hand and check which resulting plaintext makes sense. Most likely, there will be only one and so you would have recovered the original message.
+
+Another method to crack this cipher is by using frequency analysis. Since the shift cipher is a one-to-one mapping on a letter-by-letter basis, the frequency distribution of letters is preserved. For example, the most common letter in English is the letter "e". If we analyse the ciphertext and discover that the most common letter there is "g", then we know that most likely the letter "g" is the letter "e" encrypted with the given key. From this we can calculate the key to be 2 (however, the plaintext, and therefore the ciphertext, may actually deviate from this distribution, so it is not with 100% certainty that the key is 2). We can also perform the same procedure with the rest of the letters in the ciphertext and retrieve the original plaintext. This process can also be automated with some math.
+
+![](TODO)
+
+Let's once again map the alphabet with the integers 0 through 25 and also this time let $p_i$ ($0 \leq p_i \leq 1$) denote the frequency of the $i$th letter. Using the above table, we can calculate that
+
+$$
+\sum_{i=0}^{25} p_i^2 \approx 0.065
+$$
+
+Now, let $q_i$ denote the frequency of the $i$th letter in the ciphertext - this is just equal to the number of occurrences of the $i$th letter divided by the length of the ciphertext. If the key is $k$, then $p_i$ should be approximately equal to $q_{i+k}$, since the $i$th letter gets mapped to the $(i+k)$th letter (technically, these should be $i+k \mod 26$, but that's too cumbersome to write here). Therefore, if we compute
+
+$$
+I_j \overset{\mathrm{def}}{=} \sum_{i=0}^{25} p_i \cdot q_{i+j}
+$$
+
+for every value of $j \in \{0,...,25\}$, then $I_k$ should be approximately equal to 0.065, where $k$ is the actual key. For all $j \neq k$, $I_j$ would be different from 0.065. This ultimately leads to a way to recover the original key that is fairly easy to automate.
+
+# The Vigenère Cipher
+
+This cipher is a more advanced version of the shift cipher. It is a *poly-alphabetic* shift cipher. Unlike the previous ciphers, it does not define a fixed mapping on a letter-by-letter basis. Instead, it maps blocks of letters whose size depends on the key length. For example, `ab` could be mapped to `xy`, `ac` to `zt`, and `aa` to `bc`. Moreover, identical blocks will be mapped to different blocks depending on their relative position in the plaintext. `ab` could once be mapped to `xy`, but then when `ab` appears again, it may be mapped to `ci`.
+
+In the Vigenère cipher the key is no longer a single number, but rather a string of letters, where each letter is again mapped to the integers $\{0,...,25\}$. The key is then repeatedly overlaid with the plaintext and each letter in the plaintext is shifted by the amount denoted by the key letter it has been matched with.
+
+```
+Plaintext:  the golden sun shone brightly, bathing the beach in its warm sunlight
+Key:        cok ecokec oke cokec okecokec, okecoke cok ecoke co kec okec okecokec
+Ciphertext: vvo kqznip ger uvyrg pbmivdpa, pkxjwxk vvo fgoml kb sxu kkvo gernwqlv
+```
+
+Given a known key length, also called a period, $t$, a ciphertext $c = c_1 \cdot\cdot\cdot c_l$ can be divided into parts, each with length $t$. Therefore, ciphertext characters with the same relative position in each of these groups with length $t$ would have all been encrypted using the same shift amount. In the above example, for the groups `theg` and `olde`, `t` and `o` would have both been encrypted with `c`, `h` and `l` with `o` and so on. Such characters are said to comprise a *stream*. Stated in a more mathematical way, for all $j \in \{1,...,t\}$, the ciphertext characters $c_j,c_{j+t},c_{j+2t},...$ have all been encrypted by shifting the corresponding plaintext character by $k_j$ positions, where $k_j$ is the $j$th character in the key $k$. It is now possible to use frequency analysis on each stream and check what shift amount yields the correct probability distribution.
+
+If the period $t$ is not known, it may be possible to determine it by using Kasiski's method. Initially, you must identify repeated patterns of length 2-3 characters. Kasiski observed that the distance between these repeated patterns (given that they are not coincidental) is a multiple of the period $t$. In the above example, the distance between the two `vvo`s is 32 which is 8 times the period 4. 
+
+There is also a more automatable (if this is even a word) approach. Recall that, given a period $t$, the ciphertext characters in the first stream 
+$$c_1,c_{1+t},c_{1+2t},...$$
+are the upshot of encrypting the corresponding plaintext characters with the same shift amount. Therefore, the frequencies of the characters in the stream will be close to the character frequencies in the English language in some shifted order. 
+
+If we let $q_i$ denote the observed frequency of the $i$th letter in the stream ($q_i = \frac{\text{number of occurrences of the ith letter of the alphabet}}{\text{length of the stream}}$), we would expect that $q_{i+j \mod 26} \approx p_i$, where $j$ is the shift amount and $p_i$ is the frequency of the $i$th letter of the alphabet in a standard English text. Therefore, the sequence $q_0,q_1,...q_{25}$ is simply the sequence $p_0,p_1,...,p_{25}$ shifted by $j$.
+
+Referring back to previous analysis, we get that 
+
+$$
+\sum_{i=0}^{25}q_i^2=\sum_{i=0}^{25}p_i^2 \approx 0.065
+$$
+
+We can easily find the period $t$. For every $\tau =1,2,...$ and the stream $c_1,c_{1+\tau},c_{1+2\tau},...$ we can define
+
+$$
+S_{\tau} \overset{\mathrm{def}}{=} \sum_{i=0}^{25}q_i^2
+$$
+
+When $\tau=t$, it is expected that $S_{\tau} \approx 0.065$. In the rest of the cases, we would expect that the character distribution in the stream is fairly uniform (recall that the Vigenère cipher smooths out character distributions) and so 
+
+$$
+S_{\tau} = \sum_{i=0}^{25} \left(\frac{1}{26}\right)^2 \approx 0.038
+$$
+
+Ergo, the smallest value $\tau$, for which $S_{\tau} \approx 0.065$, is likely the period $t$. This can be further validated by performing the same procedure on the subsequent streams in the ciphertext such as $c_2,c_{2+\tau},c_{2+2\tau},...$ and so on.
diff --git a/vault/Computer Science/Cryptography/Hash Functions/Birthday Attacks.md b/vault/Computer Science/Cryptography/Hash Functions/Birthday Attacks.md
@@ -0,0 +1,142 @@
+# Introduction
+
+As with normal ciphers, there is a trivial brute-force attack which can find a collision in any hash function $H$. If the hashes produced by the $H$ are all of length $l_{\text{out}}$, then to find a collision we can just evaluate $H$ on $2^{l_{\text{out}}}+1$ different inputs. Since the number of possible hashes is only $2^{l_{\text{out}}}$, then at least two inputs must have produced the same hash and our job is done. 
+
+Usually, we are not particularly worried about this attack because it takes $O(2^{l_{\text{out}}})$ steps to execute. However, it turns out that there is a much more efficient attack which can find a collision against any hash function.
+
+## The Birthday Paradox
+To illustrate the attack we are going to answer the following question: given $q$ people in a room, what is the probability that two of them share a birthday? One should see how this is equivalent to asking what is the likelihood that from $q$ messages $m_1, m_2, ..., m_q$ two produce a collision in the hash function $H$.
+
+We assume that each birthday date is equally likely and that we are only working with the $B = 365$ possible birthdays in a non-leap year. The probability that two people share the same birthday is the same as the negation of the probability that *no* people share a birthday, i.e.
+the probability of a collision is the negation of the probability that there is *no* collision amongst the $q$ messages $m_1, ..., m_q$.
+
+$$
+\Pr[\text{Coll}] = 1 - \Pr[\text{NoColl}_q]
+$$
+
+Imagine the people entering the room one by one (or equivalently, the messages being generated independently one after the other). The probability that there is no collision in the birthdays of the $q$ people is the probability that there is no collision in the birthdays of the first $q-1$ people *and* that the $q$-th person's birthday also does not collide with the previous birthdays, i.e.
+
+$$
+\Pr[\text{NoColl}_q] = \Pr[\text{NoColl}_{q-1}]\times \frac{B-q+1}{B}
+$$
+
+This is true because if there were no collisions in the first $q-1$ people, then there must be $q-1$ unique birthdays and so the probability that the $q$-th person's birthday is also unique is $\frac{B-(q-1)}{B} = \frac{B-q+1}{B}$. This logic can be continued until we reach the first person. Therefore,
+
+$$
+\Pr[\text{NoColl}_q] = 1\times \frac{B-1}{B} \times \frac{B-2}{B}\times\cdots \times\frac{B-q+2}{B}\times \frac{B-q+1}{B}
+$$
+
+The 1 at the beginning represents the probability that the first person's birthday does *not* collide with someone's else when entering the room, which is 100%, since there are no other people in the room until the first one enters. This probability can be rewritten as the following product:
+
+$$
+\Pr[\text{NoColl}_q] = \prod_{i=1}^{q-1}\left(1-\frac{i}{B}\right)
+$$
+
+Therefore, the probability that a collision *does* occur can be written as
+
+$$
+\Pr[\text{Coll}] = 1 - \prod_{i=1}^{q-1}\left(1-\frac{i}{B}\right)
+$$
+
+We are now going to use a well-known inequality (we are going to take it for granted because proving it is out of scope), namely that $1-x \le e^{-x}$. Plugging in $\frac{i}{B}$ for $x$, we get that
+
+$$
+1 - \prod_{i=1}^{q-1}\left(1-\frac{i}{B}\right) \ge 1 - \prod_{i=1}^{q-1}e^{-\frac{i}{B}}
+$$
+
+What is nice about exponential functions with the same base is that when multiplying them, the exponents simply add, yielding
+
+$$
+1 - \prod_{i=1}^{q-1}e^{-\frac{i}{B}} = 1 - e^{-\frac{1}{B}\sum_{i=1}^{q-1}i} = 1 - e^{-\frac{1}{B}\frac{q(q-1)}{2}}
+$$
+
+The function $\frac{q(q-1)}{2}$ is always greater than $\frac{q^2}{2}$ for positive integers $q$ and so we have
+
+$$
+1 - e^{-\frac{1}{B}\frac{q(q-1)}{2}} \ge 1 - e^{-\frac{q^2}{2B}}
+$$
+
+Recall that the left-hand side is smaller than the probability of a collision. Therefore,
+
+$$
+\Pr[\text{Coll}] \ge 1 - e^{-\frac{q^2}{2B}}
+$$
+
+While we did not obtain an exact equation for the value of $\Pr[\text{Coll}]$, we did obtain a lower bound for it! 
+
+>[!THEOREM] Birthday Theorem
+>
+>Given $q$ elements which are uniformly and independently chosen from a set of $B$ possible elements, the probability that two elements are the same is at least $1 - e^{-\frac{q^2}{B}}$.
+>
+
+Now let's put the theorem to work. How many people do we need in the room in order for there to be 50% chance that two of them share a birthday? Well, plug in $B = 365$ and set 
+
+$$
+1 - e^{-\frac{q^2}{2\cdot365}} = \frac{1}{2}
+$$
+
+Solving this equation yields $q = 23$. We need only 23 people for there to be a 50% chance of two of them sharing a birthday!
+
+# Naive Birthday Attack
+
+If we have a hash function $H$ with outputs of length $l_{\text{out}}$, then in order to have a 50% chance of a collision, we need $q \approx 1.2\times2^{\frac{1}{2}l_{\text{out}}}$ different messages (this can be obtained from the Birthday theorem bound by setting $B = 2^{l_{\text{out}}}$). 
+
+The naive birthday attack does precisely this. First, it chooses $2^{\frac{1}{2}l_{\text{out}}}$ different messages $m_1, m_2, ..., m_q$. It then computes their hashes $h_1, h_2, ..., h_q$. Finally, it looks for a collision amongst these hashes $h_i = h_j$. With probability approximately $\frac{1}{2}$ it is going to find such a collision. If it does not, it simply starts over. On average, this attack is going to need just 2 iterations to get a colliding pair and its running time is $O(2^{l_{\text{out}}/2})$. Compare that to the brute-force approach whose running time was $O(2^{l_{\text{out}}})$.
+
+This variation is called *naive* because it has a huge space complexity, namely $O(2^{l_{\text{out}}/2})$, since the algorithm will have to store all the computed hashes while checking them for collisions.
+
+>[!WARNING]
+>
+>Since the birthday attack is universal and works for any hash function, it is used instead of the simple brute force attack as the gold standard when creating security proofs.
+>
+
+## Small-Space Birthday Attack
+
+There is an improved version of the birthday attack which has approximately the same probability success and running time but only takes a *constant* amount of memory. This attack uses [Floyd's cycle finding algorithm](https://en.wikipedia.org/wiki/Cycle_detection).
+
+Begin by choosing a random initial message $x_0$ and set $x \coloneqq x_0, x' \coloneqq x_0$. At the $i$-th iteration compare the values $x_i = H(x_{i-1})$ and $x_i' = H(H(x_{i-1}'))$. If $x_i = x_i'$, then we know that there must have been a collision somewhere along the way - it might simply happen that $x_{i-1} \ne  H(x_{i-1}')$, in which case we would have immediately found the collision pair $x_{i-1}, H(x_{i-1}')$. However, it could very well be the case that $x_{i-1} =  H(x_{i-1}')$ and so the actual collision, i.e. the two different inputs that produced the same hash, happened earlier. Since we did not store all of the hashes we burnt through, we will need to iterate over them again to find precisely which ones collide.
+
+Store the index $i$ for which we found that $x_i = x_i'$ and reset $x = x_0, x' = x_0$ to the initial value $x_0$. This time we will iterate until $i$. At each step $j$, we check if $H(x_j) = H(x_j')$ and if it is, we have our collision - simply return $x_j$ and $x_j'$. Otherwise, we set $x_j = H(x_j)$ and $x_j' =H(x_j')$.
+
+```rust
+fn SmallSpaceBirthdayAttack()
+{
+	let x_0 = random_binary_string();
+	let x = x_0;
+	let x' = x_0;
+	let i = 0;
+	
+	while(true)
+	{
+		x = H(x);
+		x' = H(H(x'));
+		
+		if (x = x')
+		{
+			break;
+		}
+		else
+		{
+			++i;
+		}
+	}
+
+	let x = x_0;
+	let x' = x_0;
+	
+	for(let j = 0; j < i; ++j)
+	{
+		if (H(x) = H(x'))
+		{
+			return (x, x');
+		}
+		else
+		{
+			x = H(x);
+			x' = H(x');
+		}
+	}
+}
+```
+
+This attack uses much less memory than the naive method because it only needs to store the initial value $x_0$ as well as the two values $x$ and $x'$ which are being checked at each iteration. As before, we have a $\approx 50\%$ chance of finding a collision within the first $2^{l_{\text{out}}/2}$ hashes we check.