Back to articles
web servers
12 min readApril 20, 2026

mbox vs Maildir vs Database: How Mail Actually Sits on Disk

Where your messages physically live shapes everything from backup strategy to incident response. Here's mbox at the byte level, Maildir's atomic delivery, and why the storage layer is where forensics actually happens.

mbox vs Maildir vs Database: How Mail Actually Sits on Disk

mbox vs Maildir vs Database: How Mail Actually Sits on Disk

Most email security writing skips the storage layer. People talk about TLS, about SPF, about brute-force protection on login — and then quietly assume that whatever happens to the message after delivery is some other team's problem. It usually isn't. The storage layer is where backups go wrong, where forensics happens, where encryption-at-rest decisions live, and where a successful attacker who got file-level access reads everything.

This is post 5 of the email security series. We're going to look at mbox, Maildir, mdbox, and database-backed storage, with a hex dump or directory listing for each. Then we'll talk about encryption at rest, and what an attacker with read access to your mail directory actually gets.

mbox: a single file, decades of locking pain

mbox is the original Unix mailbox format, going back to the 1970s. The whole inbox is a single text file. Each message is concatenated, separated by a special line that starts with

From 
(note the space, no colon).

Here's what an mbox file looks like in

xxd
:

00000000: 4672 6f6d 2073 656e 6465 7240 6578 7465  From sender@exte
00000010: 726e 616c 2e74 6573 7420 5475 6520 4170  rnal.test Tue Ap
00000020: 7220 3232 2031 343a 3132 3a32 3920 3230  r 22 14:12:29 20
00000030: 3236 0a52 6574 7572 6e2d 5061 7468 3a20  26.Return-Path: 
00000040: 3c73 656e 6465 7240 6578 7465 726e 616c  <sender@external
00000050: 2e74 6573 743e 0a52 6563 6569 7665 643a  .test>.Received:
...
00000234: 5468 6973 2069 7320 6120 7465 7374 2e0a  This is a test..
00000244: 0a46 726f 6d20 7361 6c6c 7940 7368 6f70  .From sally@shop
00000254: 2e74 6573 7420 5475 6520 4170 7220 3232  .test Tue Apr 22

The line

From sender@external.test Tue Apr 22 14:12:29 2026
is the separator. The next message starts at offset 0x244 with another
From 
line.

This raises an immediate question: what if the message body itself contains a line starting with

From 
? Answer: agents that write to mbox have to escape it by prepending
>
(so it becomes
>From 
), and readers have to unescape on the way out. This is "From-line escaping" and it's exactly as fragile as it sounds. Different agents have escaped differently for forty years; some escape
From 
only, others escape
>From 
,
>>From 
, and so on (so the unescaping is reversible). It's a mess. Don't think about it too hard.

The bigger problem is locking. Two processes writing to the same mbox at the same time will corrupt it. The history of mbox locking is the history of every Unix locking mechanism, applied haphazardly:

  • flock()
    — fast, advisory, but doesn't work over NFS
  • fcntl()
    /
    lockf()
    — works over (some) NFS, but harder to use correctly
  • Dotlock — create
    mailbox.lock
    as an exclusive marker file, with a stale-lock cleanup heuristic. Works everywhere, but the cleanup heuristic is itself a race condition

A modern Linux mail setup that uses mbox typically uses all three simultaneously to be safe (Dovecot calls this "fcntl + flock + dotlock"). Performance under load suffers proportionally.

mbox is mostly historical now. You'll find it in:

  • Legacy Unix systems with /var/spool/mail/ delivery
  • Mailing-list archives (where the single-file format is convenient for offline analysis)
  • People who haven't migrated yet

If you're starting fresh in 2026, do not use mbox. Use Maildir.

Maildir: one file per message

Maildir (designed by Dan Bernstein for qmail) replaced mbox's concatenated single file with a directory containing one file per message, plus a clever rename-based atomic delivery scheme.

A Maildir looks like:

~/Maildir/
├── tmp/      # in-progress deliveries
├── new/      # delivered, not yet seen by user
└── cur/      # seen, with flags encoded in filename

Here's

tree
output of a real one:

Maildir/
├── tmp/
├── new/
│   └── 1714233149.M123P456.mail.example.com
└── cur/
    ├── 1714230020.M999P888.mail.example.com:2,S
    ├── 1714231233.M111P222.mail.example.com:2,SR
    └── 1714232889.M333P444.mail.example.com:2,ST

Three things are happening here.

The filename is the unique ID. The format is

<unix-time>.M<microseconds>P<pid>.<hostname>
. Every message gets a globally unique filename without coordination. No locks needed.

Atomic delivery via rename. When

lmtp
delivers a message, it writes to
tmp/<unique-id>
, then
rename()
s into
new/<unique-id>
. POSIX
rename
within a single filesystem is atomic — the message either appears entirely in
new/
or doesn't appear at all. No partial reads, no corrupt half-messages.

Flags in the filename. The colon-2 suffix is the message flags.

:2,S
means "Seen."
:2,SR
is "Seen, Replied."
:2,T
is "Trashed." When you mark a message read, the IMAP server renames the file from
:2,
to
:2,S
. This is also atomic, and locking-free.

The standard flag letters are:

  • D
    — Draft
  • F
    — Flagged
  • P
    — Passed (forwarded/redirected)
  • R
    — Replied
  • S
    — Seen
  • T
    — Trashed

This design solves the locking problem at the cost of inode pressure on the filesystem. A user with 100,000 messages has 100,000 files in their Maildir, which made some old filesystems sad. ext4, XFS, and ZFS handle it fine in 2026.

mdbox: the compromise

Dovecot's

mdbox
format tries to combine Maildir's safety with mbox's I/O efficiency. Instead of one file per message, mdbox stores messages in bundled files, with each bundle holding many messages and a separate index pointing into them.

~/mdbox/
├── mailboxes/
│   └── INBOX/
│       ├── dbox-Mails/
│       │   ├── m.1
│       │   ├── m.2
│       │   └── ...
│       ├── dovecot.index
│       ├── dovecot.index.cache
│       └── dovecot.index.log

Each

m.N
file contains many messages back-to-back, with a header per message recording its size and metadata. The index files map mailbox UIDs to (file, offset, length) tuples.

The advantages:

  • Massively fewer inodes (better for filesystems with millions of messages)
  • Faster sequential reads (whole bundles fit in disk readahead)
  • Better compression (bundles can be transparently compressed)

The cost:

  • Corruption of one bundle file affects every message in it
  • Backup is harder — restoring one message means understanding the bundle format
  • It's Dovecot-specific — moving to another IMAP server requires conversion

mdbox is a sensible default for high-volume installations. For a small self-hosted setup, Maildir is simpler and the performance difference doesn't matter.

Database storage

Storing mail in a database (PostgreSQL, MySQL) is a thing some people do. Dovecot supports it through the

dict
plugin and various community-maintained backends. The argument for it is "I already have a database for everything else, let me have one consistent backup." The arguments against are stronger:

  • Mail is large, sequential, write-once, read-rarely data — exactly the workload row stores are bad at
  • Database engines weren't designed for hundreds of GB of mostly-unchanging blobs per user
  • Backup of a multi-TB mail database is significantly harder than backup of a multi-TB filesystem
  • Disaster recovery: if your database dies, every user's mail is offline. With Maildir, individual user mailboxes are independent files that can be restored one at a time

I've seen people put mail in databases. I've seen most of them migrate back. Use the filesystem.

Encryption at rest

Three options, in increasing order of security and complexity:

Full-disk encryption. LUKS on Linux. Encrypts the whole volume. Defends against an attacker who steals the physical drive but not against any attacker who can

cat
files on the running server. This is the baseline — turn it on always — but it's not "mail encryption."

Filesystem-level encryption. ZFS native encryption, ext4 fscrypt, or eCryptfs. Per-user keys can be unlocked at login. Defends against an attacker with read access to the disk via another user's account, as long as the target user isn't logged in. Useful in shared-tenant scenarios.

Per-message encryption (Dovecot's

mail_crypt
plugin). Each message is encrypted with the user's public key at delivery. Even root reading the Maildir sees ciphertext. Decryption happens in the IMAP process after the user authenticates and the private key is unlocked.

mail_crypt
is the strongest option but it has costs:

  • Server-side search becomes impossible without decrypting (defeating Dovecot's index cache)
  • Spam filtering at delivery time is complicated (most filters need the plaintext)
  • Key management is now a thing you have to do — losing a user's private key means losing their mail

For most self-hosted setups, full-disk encryption + careful access control to the mail directory is enough. For threat models where "the server itself might be compromised" is realistic,

mail_crypt
is worth the overhead.

A quick

mail_crypt
config sketch (Dovecot 2.3):

mail_plugins = $mail_plugins mail_crypt

plugin {
  mail_crypt_curve = secp521r1
  mail_crypt_save_version = 2
  mail_crypt_require_encrypted_user_key = yes
}

Each user gets a keypair generated on first login (encrypted with the user's password, so the password is required to decrypt). Read the Dovecot docs before deploying — there are several modes (per-user vs per-folder vs global keys) with different tradeoffs.

The forensic angle

Imagine an attacker who got file-level read access to your mail directory (compromised webmail server, stolen backup tape, lost-and-found laptop). What can they read?

| Storage | Attacker reads message contents? | Attacker reads metadata? | |---------|----------------------------------|--------------------------| | Maildir, no encryption | Yes, all of it | Yes (filenames have flags) | | Maildir + LUKS, server off | No | No | | Maildir + LUKS, server on | Yes — running system has key | Yes | | Maildir + mail_crypt | No (encrypted blobs) | Yes (filenames, sizes) | | mdbox + mail_crypt | No (encrypted) | Partial — index file leaks subjects |

Two things stand out:

Full-disk encryption only protects you against offline attacks. The most common compromise isn't a stolen drive — it's a compromised account or service with read access on the running system. LUKS does nothing for that.

Index files leak metadata. Dovecot's index cache can store subject lines and other header fragments, plaintext, separately from the messages themselves. If you turn on

mail_crypt
, you also need to consider whether the index cache should be disabled or encrypted, depending on your threat model. The Dovecot docs cover this; the tutorials usually don't.

Backup considerations by format

  • mbox — back up the file. Easy to copy. Hard to back up while the server is running (locking against the live server). Snapshot the filesystem (LVM, ZFS) before backup.
  • Maildir — back up the directory.
    rsync
    is your friend. Each file is independent, so partial backups recover gracefully. Watch out for filesystems with bad small-file performance on the backup side.
  • mdbox — back up the bundles and index. Consistency between bundle files and index files matters; snapshot the filesystem first if the server is running. Don't try to incremental-rsync mdbox without thinking — bundles are rewritten on compaction and rsync may copy the entire file each time.
  • Database — use the database's native backup tool (
    pg_dump
    ,
    mysqldump
    ). Plan for the size — multi-TB dumps take hours.

Gotcha: Maildir over NFS

Maildir was designed to work safely over NFS if the NFS server preserves rename semantics correctly and the client uses the right options. Dovecot has a

mail_nfs_storage = yes
option that turns on extra cache invalidation for NFS-backed Maildirs. Forget to set it and you get intermittent "mailbox state lies between two clients" bugs that are spectacularly hard to debug.

If you're running Dovecot on a single host with local storage, ignore this. If you're running a cluster of Dovecot front-ends sharing a Maildir over NFS — read the docs three times before deploying.

What I'd Tell My Past Self

The storage layer is where most of the operational pain in self-hosted mail comes from. It's where backups break, where index corruption strikes during power loss, where a

chmod
mistake turns into a privacy disaster. Spend the time to understand which format you're using and what its failure modes look like, before you have a crisis.

If you're starting fresh, my recommendations:

  • Use Maildir (simple, robust) or mdbox (faster at scale)
  • Use full-disk encryption, always
  • Add
    mail_crypt
    only if your threat model genuinely includes server compromise
  • Test your backups by actually restoring from them, not just by checking that they ran

Post 6 takes us into the CVE history. Now that you know what's at stake on disk and on the wire, the bug patterns will make a lot more sense.

Discussion

0 comments

Share your thoughts

No comments yet. Be the first to share your thoughts!