Skip to main content

👆 Digital Fingerprints

Hash Functions Explained

In 5 minutes: Understand how hash functions work and why they're useful
Prerequisite: None


🎯 The Simple Story

Alice wants to verify that Bob actually sent her a message.

Problem: Eve could send a fake message pretending to be Bob!

Alice's idea: Digital fingerprints!

  1. Alice's message: "Hello"
  2. Alice computes: Hash("Hello") = "7f1a..."
  3. Alice sends: "Hello" + "7f1a..."
  4. Bob receives and computes: Hash("Hello")
  5. Bob compares: Does his hash match what Alice sent?
  6. If Eve changed "Hello" to "Help", hash would be completely different!

The hash is like a digital fingerprint!


🧠 Mental Model

Hold this picture in your head:

Hash Function (Digital Fingerprint):

Input: "Hello"

Hash Function (H)

Output: "7f1a23b5..."

Properties:
1. One-way: Can't go from "7f1a..." to "Hello"
2. Fixed length: Always same output size (256 bits)
3. Avalanche: One bit change → completely different output
4. Collision resistant: Hard to find two inputs with same output

Think of it like:

👆 Fingerprint (Unique identifier)

🔢 Digital digest (Small representation of big data)

🔥 Burn after computing (Can compute, can't reverse)


📊 See It Happen

Let's watch a hash function in action:


🎭 The Story: Fingerprinting Messages

Alice sends an important message to Bob.

The message: "Transfer $100 to Bob. -Alice"

Eve wants to change it to "Transfer $100 to Eve. -Alice"

Without hash: Eve changes the message, Bob can't tell Eve modified it!

With hash:

  1. Alice computes: Hash("Transfer $100 to Bob. -Alice") = "f8a2..."
  2. Alice sends: "Transfer $100 to Bob. -Alice" + hash: "f8a2..."
  3. Eve intercepts, changes to: "Transfer $100 to Eve. -Alice"
  4. Eve should change hash but doesn't know to compute: Hash("Transfer $100 to Eve. -Alice")
  5. Bob receives: "Transfer $100 to Eve. -Alice" + old hash: "f8a2..."
  6. Bob computes: Hash("Transfer $100 to Eve. -Alice") = "b3c9..."
  7. Bob compares: "f8a2..." vs "b3c9..." ✗ Doesn't match!
  8. Bob knows Eve tampered with the message!

🎮 Try It Yourself

Question 1: Alice's message: "Hello". Hash("Hello") = "a1b2...". Eve changes "Hello" to "Help". Does Hash("Help") equal "a1b2..."?

Show Answer

No! Not even close!

Remember: Hash functions have the avalanche property. One character change (or even one bit change) results in a completely different hash.

Hash("Hello") might be: "a1b2c3d4..." Hash("Help") might be: "z9y8x7w6..."

They won't share any characters!

Answer: No! Completely different (avalanche effect)


Question 2: Why can't Eve figure out the message from the hash?

Show Answer

Because hash functions are one-way!

Given hash("Hello") = "a1b2c3d4...", Eve can't reverse it to get "Hello". The hash function destroys the information in a way that can't be undone.

Try reversing:

  • "a1b2..." → ???
  • No mathematical operation converts "a1b2..." to "Hello"

It's like blending orange juice:

  • Can blend oranges → orange juice
  • Can't unblend orange juice → oranges

Answer: Hash functions are one-way (can't reverse)


Question 3: What happens if two different messages have the same hash?

Show Answer

This is called a collision!

For good hash functions (like SHA-256), collisions are extremely unlikely. The probability is about 1 in 10^77.

That's like flipping a coin 256 times and getting heads every time. Basically impossible!

Modern hash functions (SHA-256, SHA-3, BLAKE2) are designed to make collisions astronomically unlikely.

Answer: Practically impossible with good hash functions like SHA-256


🔢 The Math

Hash Function Properties

A cryptographic hash function H must have:

1. Pre-image resistance (One-way):

Given y = H(x), finding x is hard.

Example: Given hash "a1b2c3d4...", find the input message.

2. Second pre-image resistance:

Given x and y = H(x), finding x' ≠ x with H(x') = y is hard.

Example: Given "Hello" and its hash "a1b2c3d4...", find another message with same hash.

3. Collision resistance:

Finding any x ≠ x' with H(x) = H(x') is hard.

Example: SHA-256

H("Hello") = 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d176482638...
256 bits = 32 bytes = 64 hex characters

For ANY input (1 byte or 1GB), output is always 256 bits!

Avalanche Effect

Input:  "Hello World"      → Hash: 486ea46224d1bb4fb680f34f7c9ad96a8f24ec88be73ea8e5a6c65260e9cb8a7
Input: "hello World" → Hash: 2c74fd17edafd2e0efba1fd472d7a3c3927ec5f3568d319d984b28b5e66eb6b31

Just changed 'H' to 'h' (1 bit difference), hashes are completely different!

💡 Why We Care

Real-World Uses

Use CaseHow It WorksExample
Message integrityCheck if message tamperedBob verifies hash matches
Password storageStore H(password), not passwordHash("mypassword") instead of password
Key derivationDerive keys from secretsKDF(secret) = encryption key
Data deduplicationSame content = same hashBlock-level duplicate detection

Signal Protocol Uses

The Signal Protocol uses hash functions for:

  1. Message key derivation:
K = H(state || message_number || content)
  1. Chain key derivation:
Next_chain_key = H(current_chain_key || input)
  1. Ciphertext verification:
Check H(ciphertext) against original hash

✅ Quick Check

Can you explain hash functions to a 5-year-old?

Try saying this out loud:

"A hash function is like a magic fingerprint machine. You put a picture in, it prints out a special code. If someone changes the picture even a tiny bit, the code changes completely. And you can't use the code to get the picture back - it's a one-way street!"

What's the avalanche effect?

Example:

Input: "Hello" → Hash: "a1b2c3d4..." Input: "Hellp" → Hash: "x9y8z7w6..."

One letter change = completely different hash output!

This is why Eve can't tamper with messages - the hash would betray her.


📋 Key Takeaways

Hash function = One-way digital fingerprint
Fixed size output = Any input → 256 bits (SHA-256)
Avalanche effect = One bit change → completely different hash
Collision resistant = Impossible to find two inputs with same hash
Can't reverse = Hash → ??? (can't find input)
Integrity check = Verify messages haven't changed
Signal Protocol use = Key derivation and message verification


🎉 What You'll Learn Next

Now you understand hash functions! These are used throughout the Signal Protocol for:

  • Deriving keys
  • Verifying message integrity
  • Chain key computation

Next, we'll learn about cryptographic signatures - how to verify who sent a message!

✍️ Continue: Wax Seals

We'll learn how Alice can prove she really sent a message, not Eve!


Now you know hash functions! Next: Cryptographic signatures!