Building Scalable Group Messaging with MLS (Message Layer Security)
⚠️ WARNING: This document is not finished. The details in this document are subject to change.
End-to-end encrypted messaging for two people is a solved problem—Signal Protocol has set the gold standard. But what happens when you want to scale that security to group chats with dozens or hundreds of participants? Traditional pairwise encryption becomes a nightmare: N participants require N(N-1)/2 encrypted channels, each with its own key management overhead.
Enter MLS (Message Layer Security), the IETF's RFC 9420 standard designed specifically for scalable group messaging. MLS provides the same strong security guarantees as Signal Protocol—forward secrecy, post-compromise security, authentication—but does so efficiently for groups of any size.
In this article, we'll explore how MLS works, why it's a game-changer for group messaging, and walk through a complete browser-based implementation using the ts-mls library. We'll cover everything from the TreeKEM algorithm to practical P2P integration with WebRTC.
Introduction to MLS
Message Layer Security (MLS) is a cryptographic protocol designed to provide end-to-end encryption for group messaging at scale. Published as RFC 9420 by the IETF in July 2023, MLS represents years of cryptographic research and real-world testing.
What Makes MLS Special?
Unlike traditional approaches to group messaging, MLS is built from the ground up for efficiency and security:
🔐 End-to-End Encryption
- Messages encrypted on sender's device, decrypted only on recipients' devices
- No server can read message contents
- Same security level as Signal Protocol, but for groups
⚡ Scalable Key Management
- Logarithmic complexity for key updates: O(log N) instead of O(N)
- 100-person group? Only ~7 operations instead of 100
- 1000-person group? Only ~10 operations instead of 1000
🔄 Forward Secrecy
- Compromise of today's keys doesn't reveal yesterday's messages
- Automatic key rotation with each message
- Protection even if long-term identity keys are leaked
🛡️ Post-Compromise Security
- System "heals" from key compromise
- New Diffie-Hellman exchanges generate fresh key material
- Attacker loses access after compromise ends
✅ Asynchronous Operations
- Members can join groups while offline
- No requirement for all participants to be online simultaneously
- Server-based key package distribution
MLS vs Signal Protocol
| Feature | Signal Protocol | MLS Protocol |
|---|---|---|
| Use Case | 1:1 messaging | Group messaging |
| Participants | 2 | 2 to thousands |
| Key Update Complexity | O(1) | O(log N) |
| Algorithm | Double Ratchet | TreeKEM |
| Key Structure | Chain keys | Binary tree |
| Asynchronous | ✅ Yes | ✅ Yes |
| Forward Secrecy | ✅ Yes | ✅ Yes |
| Post-Compromise Security | ✅ Yes | ✅ Yes |
| Standardization | De facto | RFC 9420 (IETF) |
Real-World Applications
MLS is already being adopted by major platforms:
✅ Messaging Apps
- Large group chats (family, friends, communities)
- Team collaboration platforms
- Enterprise secure messaging
✅ Video Conferencing
- Encrypted group video calls with text chat
- Webinars with encrypted Q&A
- Virtual classrooms
✅ IoT & Industrial
- Device-to-device group communication
- Sensor networks
- Industrial control systems
✅ Blockchain & Web3
- DAO governance discussions
- Private group coordination
- NFT community chats
How MLS Fits the Modern Web
The beauty of MLS for web developers is that it's designed to work in browsers:
- WebCrypto API Support: X25519, Ed25519, AES-GCM, HKDF
- Pure JavaScript: No native dependencies required (with libraries like ts-mls)
- WebRTC Integration: Works seamlessly with P2P data channels
- IndexedDB Storage: Persist group state locally
- Service Workers: Background key rotation and updates
In the rest of this article, we'll build a complete MLS implementation that runs entirely in the browser, providing Signal-level security for group chats without any centralized infrastructure.
The Group Messaging Problem
Before diving into MLS, let's understand why group messaging is fundamentally different from one-to-one encryption and why naive approaches don't scale.
The Naive Approach: Pairwise Encryption
The simplest way to do group messaging is to use pairwise encryption—encrypt each message separately for each recipient:
Problems with Pairwise Encryption:
❌ Bandwidth Explosion: Send N-1 copies of every message
- 10 people = 9 encrypted copies per message
- 100 people = 99 encrypted copies per message
- Video call with 50 people = 49 copies of every video frame!
❌ Computational Overhead: Encrypt each message N-1 times
- Each encryption requires ECDH, HKDF, AES-GCM operations
- Mobile devices quickly drain battery
- Desktop clients consume unnecessary CPU
❌ State Management Nightmare: Maintain N-1 pairwise sessions
- Each pair needs separate Double Ratchet state
- Adding/removing members requires N updates
- Synchronization becomes extremely complex
❌ No Group Context: No shared group state
- Can't verify all members see same group membership
- No transcript consistency
- Difficult to implement group-level features
The Scalability Problem
Let's quantify the problem with a real-world example:
Scenario: 50-Person Video Call with Text Chat
Pairwise Encryption Approach:
Messages sent per person per message: 49
Encryption operations: 49
Key pairs to manage: 49
Total messages in system: 50 × 49 = 2,450 per message!
For 100 messages in the chat:
Total encrypted messages: 245,000
Total encryption operations: 245,000
MLS Approach:
Messages sent per person per message: 1
Encryption operations: 1
Key updates per person: ~6 (log₂50)
Total messages in system: 50 per message
For 100 messages in the chat:
Total encrypted messages: 5,000
Total encryption operations: 5,000
Result: MLS is 49x more efficient in bandwidth and computation!
Requirements for a Group Messaging Protocol
A secure, scalable group messaging protocol must provide:
Security Requirements:
- ✅ End-to-End Encryption: Messages only readable by group members
- ✅ Forward Secrecy: Past messages secure even if keys compromised
- ✅ Post-Compromise Security: Recovery from key compromise
- ✅ Authentication: Verify sender identity
- ✅ Transcript Consistency: All members agree on message order and group state
Efficiency Requirements: 6. ✅ Logarithmic Key Updates: O(log N) complexity 7. ✅ Single Message Encryption: One ciphertext for all recipients 8. ✅ Efficient Member Changes: Add/remove without rekeying everyone 9. ✅ Asynchronous Operations: Work when members offline
Operational Requirements: 10. ✅ Dynamic Membership: Add/remove members at any time 11. ✅ Crash Recovery: Rejoin after network failure 12. ✅ State Consistency: All members synchronized
Enter TreeKEM
MLS solves these problems with TreeKEM, a key agreement protocol based on binary trees. Instead of maintaining pairwise keys, TreeKEM organizes group members into a binary tree where:
- Leaf nodes = Group members
- Parent nodes = Shared secrets between subtrees
- Root node = Group secret shared by all
Key Insight: When Alice sends a message:
- She encrypts once with the root secret
- All 4 members can decrypt using their path to the root
- Only ~log₂(4) = 2 operations needed per member
This is the fundamental innovation that makes MLS scale to thousands of participants.
How MLS Works: Protocol Deep Dive
Now let's explore the MLS protocol in detail—how keys are managed, how groups evolve, and how messages stay secure.
Core Components
MLS consists of several key components working together:
1. Key Packages
A key package is like a business card for joining MLS groups. It contains:
KeyPackage = {
version: 'mls10', // MLS protocol version
cipherSuite: 0x0001, // Cryptographic algorithms
initKey: <X25519 public key>, // For encrypting Welcome
leafNode: {
credential: {
type: 'basic',
identity: 'alice@example.com' // User identity
},
capabilities: [...], // Supported features
encryptionKey: <X25519 public>, // For TreeKEM
signatureKey: <Ed25519 public>, // For authentication
lifetime: 90 days // Validity period
},
signature: <Ed25519 signature> // Sign the whole package
}
Purpose:
- Distributed to others before they can add you to a group
- Uploaded to a server's "key package store"
- Consumed once and discarded (like one-time prekeys)
2. Ratchet Tree
The ratchet tree is a binary tree structure storing the key material:
Key Properties:
- Leaf nodes (odd indices): Contain member credentials and private keys
- Parent nodes (even indices): Contain shared secrets via ECDH
- Root node: Derives the encryption secret for all messages
- Height: log₂(N) where N = number of members
Navigation Rules:
- Node at index
i:- Left child:
2i + 1 - Right child:
2i + 2 - Parent:
(i - 1) / 2
- Left child:
3. Group Context
The group context provides shared group state:
GroupContext = {
version: 'mls10',
cipherSuite: 0x0001,
groupId: 'secure-chat-room',
epoch: 42, // Increments with each change
treeHash: <hash of ratchet tree>, // Ensures consistency
confirmedTranscript: <hash>, // Message history commitment
extensions: [...] // Group metadata
}
Purpose:
- Synchronized across all members
- Changes only via commits
- Ensures all members have identical view of group
4. Epochs
An epoch is a period of time during which the group state is stable:
Epoch Transitions:
- Triggered by commits (member add/remove, key rotation)
- Increments epoch number
- Updates tree hash
- All members must process the commit to stay synchronized
