Email Domain Replication System Notes
Goals
-
Reliable, fault tolerant storage of email
-
High-speed local access to email
-
Efficient bandwidth usage across WAN
-
Secure communication
-
Domain-level (sparse) replication
-
Simple addition and removal of servers and domains
Architecture
See the diagram for a visual description of how the pieces fit together.
-
Replication model:
-
Each server runs a remote proxy that connects to as many other servers
as possible.
-
A remote agent maintains a connection to a remote.
-
A local agent manages the file store.
-
The remote agent arranges for locally-initiated actions to be completed
remotely, and for remotely-initiated actions to be completed locally.
-
Security model:
-
Each server has a public/private key pair.
-
When remote agents connect, each end of a connection verifies the other
against the known public key.
-
No local encryption is done.
-
All remote connections are encrypted with a SSL wrapper.
-
Synchronization model:
-
All events are effectively recorded in a log.
-
Messages are named such that collisions between systems is impossible.
-
Once created, messages are immutable.
-
All systems are assumed to have perfect knowledge about accounts, permissions,
etc.
-
Plan #1: Communicate between proxy and agents using pipes
-
Proxy uses a local agent to manipulate mail store
-
Pros:
-
Allows for the case of being a front end to a remote mail store (but is
this a reasonable configuration?)
-
Cons:
-
One more protocol encode/decode stage
-
More complex in the case of multiple agents
-
Redo log will have multiple writers (remote agents, local proxy, remote
proxy)
-
Plan #2: Communicate between agents using files (greatly preferred)
-
No local agent
-
Access and delivery agents manipulate the mail store directly
-
Pros:
-
Simplest interaction between the various stages
-
Access agent can be reduced to a simple library that intercepts I/O system
calls
-
Remote agents can be started and stopped at will without any involvement
from other processes
-
Cons:
-
Requires a local mail store
-
Creates problems with synchronizing multiple (remote) agents without polling
-
Redo log will have multiple writers (proxy and remote agents)
Development Plan
Development should follow a "zero, one, any" plan -- implementation should
always either disallow a feature (zero), allow the feature in only one
mode or instance (one), or allow totally arbitrary use of that feature
(any). Development of a feature that is required to by "any" should if
possible proceed from the "zero" and "one" stages first.
-
Single-server model (zero remotes)
-
Proof of concept with:
-
Local only mail store
-
Simplified local agent library (don't handle folders)
-
Delivery agent
-
POP server
-
Two-server model (one remote)
-
Implement proxy server that knows about one local and one remote
-
Implement the redo log
-
Transmit the redo log
-
Trivial (no) authentication
-
Total domain replication
-
Add public/private keys to remote proxy agents
-
Full local protocol
-
Flesh out the access agent library to handle folders and message numbering
-
Modify Courier IMAP to use the access agent library
-
Multiple-server model (any remotes)
-
Simple multiple-server
-
No forwarding of redo activity
-
Implement partial domain replication
-
Each server will have a list of domains that they are responsible for
-
Each server will know the lists from the other servers
-
Only redo activity necessary for the target will be forwarded
-
Implement spanning-tree algorithms into proxy
-
Each server forwards on any activity on as necessary to partially-connected
servers
Definitions
-
EID: Email IDentifier, uniquely identifies any message across all systems
-
Domain name
-
Account name
-
Folder name
-
UID (Unique IDentifier), which is composed of:
-
Timestamp (including microseconds?)
-
PID
-
Originating hostname
-
Written as "account@domain/folder/UID"
Redo Log
-
Data for each message needs to be temporally ordered.
-
Messages do not need to be temporally ordered.
-
The log is used to determine what actions to take on a remote.
-
Previous plan: store the journal as a set of files
-
Two files per EID:
-
The message itself
-
List of locals or remotes that:
-
still need to process the message, or
-
have processed the message
-
The files are hard linked to the local mail store for non-deleted messages.
-
The redo log must therefore be in the same partition as the mail store.
-
Deleted messages are truncated in the redo log (all valid messages have
non-zero length).
-
Cases:
-
Addition of a new message
-
Create new message file in temporary area
-
Write message data to the file
-
Link the file into the redo log
-
Move the file into the mail store
-
Modifications to a message's flags
-
If a link to the message does not exist in the redo log: create it
-
Change the flags on the message in the redo log
-
Change the flags on the message in the mail store
-
Deleted message
-
If a link to the message does not exist in the redo log: create it
-
Truncate the file
-
Remove the link from the mail store
-
Message moved from one folder to another
-
If a link to the message does not exist in the redo log: create it
-
???
-
Problems:
-
Atomic operations are rather difficult and time consuming due to requiring
multiple syncs.
-
New plan: linear journal files
-
Each journal file has a maximum size.
-
Each activity is appended as a transaction record to the journal.
-
Journal can be used to replace synchronous operations on the mail store.
-
Record contents:
-
Action type
-
Domain name
-
Account name
-
Folder
-
UID
-
Message flags (optional)
-
New folder (optional)
-
New message contents (optional)
-
Cases:
-
Addition of a new message
-
Create new message file in destination account
-
Create new journal record
-
Write message data to both new file and to journal
-
Commit journal
-
Modifications to a message's flags, or message moved between folders
-
Rename message file
-
Write & commit journal record
-
Deleted message
-
-
Message moved from one folder to another
-
Unresolved Issues and Future Ideas
-
How does a new server get added to a network with existing data? One idea
is to first insert it into the tables of the remotes, then mark it as down
so that it will start accumulating redo-log entries, then mass-rsync all
the data over to the target system. After that's done, roll forwards
the redo-log entries, and mark it as up. Can this cause unsynchronization,
though?
-
This architecture doesn't address scalability. One possibility for accomplishing
clustering to achieve scalability is to go a step further with sparse domain
replication (each server has a subset of the accounts for a domain). This
would require some kind of "director" device to front-end connections to
the access agent and effect the load balancing.
-
Another step forward past the multi-server model is a multi-layer model,
where there are client servers that effectively serve as caches to the
actual mail accounts, and only store the accounts that are accessed.