* Know what is Birthday Paradox

Birthday Paradox is one of the most paradox we talk about in cryptography
some people treats it as the most powerful attack on hash function, but is it really powerful?!! and why just on hashes why not other functions

first of all we have to define the paradox

we have $$ X = {0,1,...,n-1} $$ and we are asking, if we picked independent picks, how many picks we need so we get one value twice ?

the picks are independent and the element we pick from the set we return it back to the set so always \( |X| = n \)

like that we know" from the paradox" that if we have \( k =\sqrt {n}\)  elements there is 50% chance that two picks give us the same element whatever this element is. here comes the crucial point, how we pick!!, the picks must be uniform distributed and this is a crucial requirement

Here we will ask a question. if we need \( k\) elements and those elements have two elements have the same value, we don't know which, but there are two elements with the same value, what we have to do to know the Duplicated value?!!
-We have to search through the elements to find this two identical values,
so how to search?

1- regular search which will be \(O(k^2)\) which equals \(O(n)\)

2- sorting then searching
   I-  Bubble sorting then searching by comparing i, i+1 for the two identical values
   II- Merge sorting then searching by comparing i, i+1 for the two identical values

   of course the option of Merge sorting then searching is the best, it needs
       1- Picking k elements \(O(k) = O(\sqrt {n})\)
       2- Sorting \( O(k log(k) ) = O( \sqrt{n} log(\sqrt{n}) ) \)
       3- Searching \( O(k) = O ( \sqrt {n} ) \)
       all of that equals  \( O(k+k log(k) +k) \)

what i have to say before continue that birthday paradox is not some thing you can use or not to use, it 's how math works :D
if pick are not uniformly distributed that will cause bias any maybe the bias gives you advantage, but we restrict ourselves to uniform distribution for easy calculation and because hash functions output is uniform distributed and this is a crucial point when constructing hash functions - output must be uniformly distributed-  

Now about Birthday paradox in Hashing
First, what we are exploiting?!!
we are exploiting Collision and here we really really need to talk about collision and what it is.
Collision is mandatory in Hash Functions because we map a set to a smaller set so any hash function has collision - in math base -
\( H:X \to Y \)
\(X:\{0,1\}^m\) ,\(Y:\{0,1\}^n\) \(\backepsilon\) \(m > n\)

but in cryptography when we say that hash function is collision resistance we mean that attack can't find a fast way to compute the collision because any hash function must has more than 1 input map to same output - in math base -, but actually it 's not the thing i want to clarify :D

if i asked, can we use collision to bypass password hashing or integrity check what you will say
-yes ?!
It's Wrong we can't because that it's not how collision works
collision is giving you the name of the hash function and asking you to find  \( x_0\)and  \( x_1\)
H(\( x_0\)) = H( \( x_1\) )
Any \(x_1\) and \(x_0\) and that what Birthday Paradox about and this is not threat model in storing passwords or integrity

Complexity of Birthday paradox on hash function ex: MD5

MD5 is 128-bit output so:

1- we compute 2^64 hash O(C * 2^64) \backepsilon C is number of steps to compute single hash

2- Merge sorting O(2^64 * log(2^64) )
3- Searching O(2^64)

total = O( C * 2^64 + 2^64 * log(2^64) +  2^64) = O( 2^64 [C + log(2^64) + 1] )
with 50% chance

** as you know from birthday paradox it's not restricted to 50% we have different size of subsets according to our chance - review birthday paradox -

**we don't care if input is not random we care about output and output in hash function is uniform distributed

Final Words
1- Collision is not serious problem to password storing or file integrity
2- Birthday paradox is mandatory not optional
3- the searching part is the optional part is Birthday paradox, and it's very crucial, because the complexity as runtime is heavily depends on it


[1]Birthday Paradox