Skip to main content

Shrinking Rooms

Understanding Encoding


The Simple Story

Imagine you need to send someone a recipe:

"3 cups flour, 2 teaspoons salt, 1 egg, bake at 350°F for 45 minutes"

Problem:

  • Recipe is 100 words long → Too big!
  • Phone line is slow → Takes forever to send!

Solution: Write it in a secret code!

Replace each word with a short number:

  • "flour" → "3"
  • "salt" → "5"
  • "egg" → "7"
  • "bake" → "9"

Now the recipe is: "33579" (5 digits instead of 100 words!)

That's encoding! Converting stuff into a format computers can send efficiently.


Mental Model

Hold this picture in your head:

A big room (huge polynomial):
┌─────────────────────────┐
│ • • • • • │
│ • • • • • │ 256 coordinates
│ • • • • • │ Huge! Difficult!
│ • • • • • │
│ • • • • • │
└─────────────────────────┘

↓ Encoding ↓

A tiny box (compressed bytes):
┌──┐
│384│ ← Just 384 bytes! │ ├──────────┘

Much smaller, but Bob can re-expand to the original room!

Think of it like:

Phone call (encode big message → signal → decode → original message)

Map (big territory → small map → you navigate anyway)

Key (store 1 key, not 10,000 door locks)


See It Happen

Let's see encoding in action:


Try It Yourself

Question 1: You have coefficients [3, 5, 2]. How many bytes to store them (12 bits each)?

Show Answer

12 bits per coefficient, 3 coefficients:

Total bits: 12 × 3 = 36 bits

Convert to bytes: 36 ÷ 8 = 4.5 bytes

Since bytes don't split: 5 bytes (round up)

Answer: 5 bytes (42 bits used)


Question 2: If you pack [01010101, 10101010, 11110000] into bytes, what does it look like?

Show Answer

Each 8-bit pattern is already a byte:

  • 01010101 → 85 (decimal)
  • 10101010 → 170 (decimal)
  • 11110000 → 240 (decimal)

Packed: [85, 170, 240] (3 bytes exactly!)


Question 3: ML-KEM matrices need 2,304 coefficients (12 bits each). How many bytes raw? How with compression trick?

Show Answer

Without compression:

2,304 coefficients × 12 bits = 27,648 bits 27,648 ÷ 8 = 3,456 bytes

With compression (seed trick):

Store just a 32-byte seed Expand deterministically to 3,456 bytes

Compression: 3,456 → 32 bytes (98% smaller!)

Answer: Raw: 3,456 bytes. Compressed: 32 bytes.


The Math

What is Encoding?

Encoding = Converting something to a different format

Encoding polynomial: p(x)=a0+a1x+a2x2++a255x255p(x) = a_0 + a_1x + a_2x^2 + \dots + a_{255}x^{255}

Encode coefficients: Encode(p)=a0a1a2a255\text{Encode}(p) = a_0 \| a_1 \| a_2 \| \dots \| a_{255}

Where \| = concatenation (join together)

Coefficient Bit Encoding

ML-KEM uses 12-bit encoding for coefficients:

Coefficient: 3  →  000000000011 (12 bits)
Coefficient: 5 → 00000000101 (12 bits)
Coefficient: 2 → 000000000010 (12 bits)

Join them: 000000000011 000000001010 000000000010

Why 12 bits?

  • Coefficients: 0 to 3,328
  • 12 bits = 0 to 4,095 (enough for 3,328)

Bit Packing

Efficient storage means packing 12 bits into 8-bit bytes:

Coefficient 1: 010101010101 (12 bits)
Coefficient 2: 001100110011 (12 bits)

Pack into bytes:
[01010101] [0100 001]|100110011]

Note: Boundaries cross bytes! Must pack carefully.

Bit packing = Join bit strings, then split into 8-bit groups

Compression: Seed Trick

Instead of storing 3,456 bytes (matrix A):

  1. Generate 32-byte random seed
  2. Store just the seed (32 bytes)
  3. When needed: Expand seed → full matrix (3,456 bytes)

Compression ratio: 3,456 / 32 ≈ 99% smaller!


Why We Care

ML-KEM Needs Efficient Storage

Matrix A: 3×3 = 9 polynomials = 2,304 coefficients

Without encoding:

  • Raw: 12 bits × 2,304 = 27,648 bits = 3,456 bytes

With encoding:

  • Compress: 3,456 → 864 bytes (public key component)
  • OR seed representation: 3,456 → 32 bytes

Result: Public key: 864 bytes + 320 bytes (vector t) = 1,184 bytes

Transmission Speed

Imagine sending across slow network:

Without encoding:

  • 3,456 bytes × 8 seconds/KB ≈ 27 seconds

With encoding:

  • 864 bytes × 8 seconds/KB ≈ 7 seconds

3× faster! (And with seed: 32 bytes = instant!)


Quick Check

Can you explain encoding to a 5-year-old?

Try saying this out loud:

"Encoding is like writing a long recipe in a secret code. instead of writing 'flour, sugar, egg' (lots of words), you replace each word with a number like '3, 5, 7'. It's a small code but still means the same thing!"

Can you compute encoding sizes?

Try this examples:

Coefficient [4, 7, 2]. Encode with 12 bits each.

  • 4 = 000000000100 (12 bits)
  • 7 = 000000000111 (12 bits)
  • 2 = 000000000010 (12 bits)

Join: 000000000100 000000000111 000000000010 (36 bits)

To bytes: 36 ÷ 8 = 4.5 → 5 bytes

Answer: 5 bytes (42 bits used)


Key Takeaways

Encoding = Convert to computer-storable format 12 bits/coeff = Needed for 0-3,328 range Bit packing = Join bits into 8-bit bytes Compression = 3,456 bytes → 864 bytes (matrix A) Seed trick = 32 bytes → 3,456 bytes (98% smaller) Public key = 1,184 bytes (864 A + 320 t) Encoding/decoding = Reversible process (Bob can recover)