Shrinking Rooms
Understanding Encoding
The Simple Story
Imagine you need to send someone a recipe:
"3 cups flour, 2 teaspoons salt, 1 egg, bake at 350°F for 45 minutes"
Problem:
- Recipe is 100 words long → Too big!
- Phone line is slow → Takes forever to send!
Solution: Write it in a secret code!
Replace each word with a short number:
- "flour" → "3"
- "salt" → "5"
- "egg" → "7"
- "bake" → "9"
Now the recipe is: "33579" (5 digits instead of 100 words!)
That's encoding! Converting stuff into a format computers can send efficiently.
Mental Model
Hold this picture in your head:
A big room (huge polynomial):
┌─────────────────────────┐
│ • • • • • │
│ • • • • • │ 256 coordinates
│ • • • • • │ Huge! Difficult!
│ • • • • • │
│ • • • • • │
└─────────────────────────┘
↓ Encoding ↓
A tiny box (compressed bytes):
┌──┐
│384│ ← Just 384 bytes! │ ├──────────┘
Much smaller, but Bob can re-expand to the original room!
Think of it like:
Phone call (encode big message → signal → decode → original message)
Map (big territory → small map → you navigate anyway)
Key (store 1 key, not 10,000 door locks)
See It Happen
Let's see encoding in action:
Try It Yourself
Question 1: You have coefficients [3, 5, 2]. How many bytes to store them (12 bits each)?
Show Answer
12 bits per coefficient, 3 coefficients:
Total bits: 12 × 3 = 36 bits
Convert to bytes: 36 ÷ 8 = 4.5 bytes
Since bytes don't split: 5 bytes (round up)
Answer: 5 bytes (42 bits used)
Question 2: If you pack [01010101, 10101010, 11110000] into bytes, what does it look like?
Show Answer
Each 8-bit pattern is already a byte:
- 01010101 → 85 (decimal)
- 10101010 → 170 (decimal)
- 11110000 → 240 (decimal)
Packed: [85, 170, 240] (3 bytes exactly!)
Question 3: ML-KEM matrices need 2,304 coefficients (12 bits each). How many bytes raw? How with compression trick?
Show Answer
Without compression:
2,304 coefficients × 12 bits = 27,648 bits 27,648 ÷ 8 = 3,456 bytes
With compression (seed trick):
Store just a 32-byte seed Expand deterministically to 3,456 bytes
Compression: 3,456 → 32 bytes (98% smaller!)
Answer: Raw: 3,456 bytes. Compressed: 32 bytes.
The Math
What is Encoding?
Encoding = Converting something to a different format
Encoding polynomial:
Encode coefficients:
Where = concatenation (join together)
Coefficient Bit Encoding
ML-KEM uses 12-bit encoding for coefficients:
Coefficient: 3 → 000000000011 (12 bits)
Coefficient: 5 → 00000000101 (12 bits)
Coefficient: 2 → 000000000010 (12 bits)
Join them: 000000000011 000000001010 000000000010
Why 12 bits?
- Coefficients: 0 to 3,328
- 12 bits = 0 to 4,095 (enough for 3,328)
Bit Packing
Efficient storage means packing 12 bits into 8-bit bytes:
Coefficient 1: 010101010101 (12 bits)
Coefficient 2: 001100110011 (12 bits)
Pack into bytes:
[01010101] [0100 001]|100110011]
Note: Boundaries cross bytes! Must pack carefully.
Bit packing = Join bit strings, then split into 8-bit groups
Compression: Seed Trick
Instead of storing 3,456 bytes (matrix A):
- Generate 32-byte random seed
- Store just the seed (32 bytes)
- When needed: Expand seed → full matrix (3,456 bytes)
Compression ratio: 3,456 / 32 ≈ 99% smaller!
Why We Care
ML-KEM Needs Efficient Storage
Matrix A: 3×3 = 9 polynomials = 2,304 coefficients
Without encoding:
- Raw: 12 bits × 2,304 = 27,648 bits = 3,456 bytes
With encoding:
- Compress: 3,456 → 864 bytes (public key component)
- OR seed representation: 3,456 → 32 bytes
Result: Public key: 864 bytes + 320 bytes (vector t) = 1,184 bytes
Transmission Speed
Imagine sending across slow network:
Without encoding:
- 3,456 bytes × 8 seconds/KB ≈ 27 seconds
With encoding:
- 864 bytes × 8 seconds/KB ≈ 7 seconds
3× faster! (And with seed: 32 bytes = instant!)
Quick Check
Can you explain encoding to a 5-year-old?
Try saying this out loud:
"Encoding is like writing a long recipe in a secret code. instead of writing 'flour, sugar, egg' (lots of words), you replace each word with a number like '3, 5, 7'. It's a small code but still means the same thing!"
Can you compute encoding sizes?
Try this examples:
Coefficient [4, 7, 2]. Encode with 12 bits each.
- 4 = 000000000100 (12 bits)
- 7 = 000000000111 (12 bits)
- 2 = 000000000010 (12 bits)
Join: 000000000100 000000000111 000000000010 (36 bits)
To bytes: 36 ÷ 8 = 4.5 → 5 bytes
Answer: 5 bytes (42 bits used)
Key Takeaways
Encoding = Convert to computer-storable format 12 bits/coeff = Needed for 0-3,328 range Bit packing = Join bits into 8-bit bytes Compression = 3,456 bytes → 864 bytes (matrix A) Seed trick = 32 bytes → 3,456 bytes (98% smaller) Public key = 1,184 bytes (864 A + 320 t) Encoding/decoding = Reversible process (Bob can recover)