👆 Digital Fingerprints
Hash Functions Explained
In 5 minutes: Understand how hash functions work and why they're useful
Prerequisite: None
🎯 The Simple Story
Alice wants to verify that Bob actually sent her a message.
Problem: Eve could send a fake message pretending to be Bob!
Alice's idea: Digital fingerprints!
- Alice's message: "Hello"
- Alice computes: Hash("Hello") = "7f1a..."
- Alice sends: "Hello" + "7f1a..."
- Bob receives and computes: Hash("Hello")
- Bob compares: Does his hash match what Alice sent?
- If Eve changed "Hello" to "Help", hash would be completely different!
The hash is like a digital fingerprint!
🧠 Mental Model
Hold this picture in your head:
Hash Function (Digital Fingerprint):
Input: "Hello"
↓
Hash Function (H)
↓
Output: "7f1a23b5..."
Properties:
1. One-way: Can't go from "7f1a..." to "Hello"
2. Fixed length: Always same output size (256 bits)
3. Avalanche: One bit change → completely different output
4. Collision resistant: Hard to find two inputs with same output
Think of it like:
👆 Fingerprint (Unique identifier)
🔢 Digital digest (Small representation of big data)
🔥 Burn after computing (Can compute, can't reverse)
📊 See It Happen
Let's watch a hash function in action:
🎭 The Story: Fingerprinting Messages
Alice sends an important message to Bob.
The message: "Transfer $100 to Bob. -Alice"
Eve wants to change it to "Transfer $100 to Eve. -Alice"
Without hash: Eve changes the message, Bob can't tell Eve modified it!
With hash:
- Alice computes: Hash("Transfer $100 to Bob. -Alice") = "f8a2..."
- Alice sends: "Transfer $100 to Bob. -Alice" + hash: "f8a2..."
- Eve intercepts, changes to: "Transfer $100 to Eve. -Alice"
- Eve should change hash but doesn't know to compute: Hash("Transfer $100 to Eve. -Alice")
- Bob receives: "Transfer $100 to Eve. -Alice" + old hash: "f8a2..."
- Bob computes: Hash("Transfer $100 to Eve. -Alice") = "b3c9..."
- Bob compares: "f8a2..." vs "b3c9..." ✗ Doesn't match!
- Bob knows Eve tampered with the message!
🎮 Try It Yourself
Question 1: Alice's message: "Hello". Hash("Hello") = "a1b2...". Eve changes "Hello" to "Help". Does Hash("Help") equal "a1b2..."?
Show Answer
No! Not even close!
Remember: Hash functions have the avalanche property. One character change (or even one bit change) results in a completely different hash.
Hash("Hello") might be: "a1b2c3d4..." Hash("Help") might be: "z9y8x7w6..."
They won't share any characters!
Answer: No! Completely different (avalanche effect)
Question 2: Why can't Eve figure out the message from the hash?
Show Answer
Because hash functions are one-way!
Given hash("Hello") = "a1b2c3d4...", Eve can't reverse it to get "Hello". The hash function destroys the information in a way that can't be undone.
Try reversing:
- "a1b2..." → ???
- No mathematical operation converts "a1b2..." to "Hello"
It's like blending orange juice:
- Can blend oranges → orange juice
- Can't unblend orange juice → oranges
Answer: Hash functions are one-way (can't reverse)
Question 3: What happens if two different messages have the same hash?
Show Answer
This is called a collision!
For good hash functions (like SHA-256), collisions are extremely unlikely. The probability is about 1 in 10^77.
That's like flipping a coin 256 times and getting heads every time. Basically impossible!
Modern hash functions (SHA-256, SHA-3, BLAKE2) are designed to make collisions astronomically unlikely.
Answer: Practically impossible with good hash functions like SHA-256
🔢 The Math
Hash Function Properties
A cryptographic hash function H must have:
1. Pre-image resistance (One-way):
Given y = H(x), finding x is hard.
Example: Given hash "a1b2c3d4...", find the input message.
2. Second pre-image resistance:
Given x and y = H(x), finding x' ≠ x with H(x') = y is hard.
Example: Given "Hello" and its hash "a1b2c3d4...", find another message with same hash.
3. Collision resistance:
Finding any x ≠ x' with H(x) = H(x') is hard.
Example: SHA-256
H("Hello") = 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d176482638...
256 bits = 32 bytes = 64 hex characters
For ANY input (1 byte or 1GB), output is always 256 bits!
Avalanche Effect
Input: "Hello World" → Hash: 486ea46224d1bb4fb680f34f7c9ad96a8f24ec88be73ea8e5a6c65260e9cb8a7
Input: "hello World" → Hash: 2c74fd17edafd2e0efba1fd472d7a3c3927ec5f3568d319d984b28b5e66eb6b31
Just changed 'H' to 'h' (1 bit difference), hashes are completely different!
💡 Why We Care
Real-World Uses
| Use Case | How It Works | Example |
|---|---|---|
| Message integrity | Check if message tampered | Bob verifies hash matches |
| Password storage | Store H(password), not password | Hash("mypassword") instead of password |
| Key derivation | Derive keys from secrets | KDF(secret) = encryption key |
| Data deduplication | Same content = same hash | Block-level duplicate detection |
Signal Protocol Uses
The Signal Protocol uses hash functions for:
- Message key derivation:
K = H(state || message_number || content)
- Chain key derivation:
Next_chain_key = H(current_chain_key || input)
- Ciphertext verification:
Check H(ciphertext) against original hash
✅ Quick Check
Can you explain hash functions to a 5-year-old?
Try saying this out loud:
"A hash function is like a magic fingerprint machine. You put a picture in, it prints out a special code. If someone changes the picture even a tiny bit, the code changes completely. And you can't use the code to get the picture back - it's a one-way street!"
What's the avalanche effect?
Example:
Input: "Hello" → Hash: "a1b2c3d4..." Input: "Hellp" → Hash: "x9y8z7w6..."
One letter change = completely different hash output!
This is why Eve can't tamper with messages - the hash would betray her.
📋 Key Takeaways
✅ Hash function = One-way digital fingerprint
✅ Fixed size output = Any input → 256 bits (SHA-256)
✅ Avalanche effect = One bit change → completely different hash
✅ Collision resistant = Impossible to find two inputs with same hash
✅ Can't reverse = Hash → ??? (can't find input)
✅ Integrity check = Verify messages haven't changed
✅ Signal Protocol use = Key derivation and message verification
🎉 What You'll Learn Next
Now you understand hash functions! These are used throughout the Signal Protocol for:
- Deriving keys
- Verifying message integrity
- Chain key computation
Next, we'll learn about cryptographic signatures - how to verify who sent a message!
We'll learn how Alice can prove she really sent a message, not Eve!
Now you know hash functions! Next: Cryptographic signatures!