SMTP, IMAP, and POP3 at the Byte Level

There's a specific kind of clarity you only get from looking at protocol bytes. Diagrams lie a little bit — they smooth over the real corners of how a protocol works. The wire doesn't lie. If you've never opened Wireshark on your own mail server's traffic, this post is going to be the most fun one in the series.

We're going to walk through SMTP, IMAP, and POP3 by capturing real sessions and reading the bytes. Then we'll get into the interesting stuff: the STARTTLS upgrade dance, why submission and SMTPS coexist, and the byte-level parser disagreement that made the 2023 SMTP smuggling attack possible.

A note on the labs: every command in this post can be run against your own mail server (or a local test server) without harming anything. Don't run them against someone else's server. Even read-only port scans can land you in awkward conversations.

SMTP: a thirty-year-old chatty protocol

SMTP is text. Every command is a short ASCII keyword followed by arguments, terminated by

\r\n

. Every response is a three-digit code followed by space and a human-readable message. It looks like this on the wire — captured on port 25 with

tcpdump -A

S: 220 mail.example.com ESMTP Postfix (Debian/GNU)
C: EHLO sender.test
S: 250-mail.example.com
S: 250-PIPELINING
S: 250-SIZE 10240000
S: 250-VRFY
S: 250-ETRN
S: 250-STARTTLS
S: 250-ENHANCEDSTATUSCODES
S: 250-8BITMIME
S: 250-DSN
S: 250 SMTPUTF8
C: MAIL FROM:<sender@external.test>
S: 250 2.1.0 Ok
C: RCPT TO:<me@example.com>
S: 250 2.1.5 Ok
C: DATA
S: 354 End data with <CR><LF>.<CR><LF>
C: From: sender@external.test
C: To: me@example.com
C: Subject: Hello
C:
C: This is a test message.
C: .
S: 250 2.0.0 Ok: queued as 4F8B22C0A
C: QUIT
S: 221 2.0.0 Bye

Two things matter here, and both are subtle.

The envelope is separate from the message. The

MAIL FROM

and

RCPT TO

commands are the envelope — the metadata SMTP servers route on. The

From:

and

To:

headers inside the

DATA

block are the message — what mail clients display. These can be completely different, and the gap between them is where almost every spoofing attack lives. Spoofed mail isn't usually about hacking SMTP; it's about a human seeing the message-level

From: ceo@company.com

and never noticing the envelope said something else.

The end of DATA is a single dot on a line by itself. Specifically, the byte sequence is

\r\n.\r\n

— CRLF, dot, CRLF. To send a literal dot at the start of a line, you have to escape it as

..

(dot-stuffing). This is RFC 5321, exactly as Postel and friends wrote it in 1982. The dot-CRLF terminator is also where SMTP smuggling lived, which we'll get to.

STARTTLS: the upgrade handshake

Plain SMTP in 2026 is increasingly rare for client submission, but it's still common between MTAs (server-to-server). The way encryption gets added is STARTTLS, which upgrades an existing plaintext connection to TLS in place.

The dance:

S: 220 mail.example.com ESMTP
C: EHLO sender
S: 250-mail.example.com
S: 250-STARTTLS
S: 250 ...
C: STARTTLS
S: 220 2.0.0 Ready to start TLS
[TLS handshake — ClientHello, ServerHello, certificate, ...]
[All subsequent traffic is encrypted]
C: EHLO sender   <-- repeat after TLS upgrade
S: 250 ...

Two attack-relevant things here:

STARTTLS stripping. A man-in-the-middle attacker can strip the

250-STARTTLS

line out of the server's EHLO response. The client never sees the offer, so it never tries to upgrade. The session continues in plaintext, MITM reads everything. This is why MTA-STS and DANE exist — they let domains commit to requiring TLS so that a stripped offer is detectable.

Plaintext command injection across the upgrade. If the SMTP server doesn't carefully empty its read buffer at the moment STARTTLS succeeds, commands the attacker pre-injected before the TLS handshake can be processed after the handshake — as if they came from the authenticated TLS session. This was Postfix CVE-2011-0411 and equivalent bugs in other servers. The fix is to discard any unread bytes the moment TLS is initiated. Worth checking your server's source if you ever wonder why such a small change in the state machine matters: it's because one missing buffer-flush has historically been a credential leak.

Submission vs SMTPS: the religious war

You'll see two ports for authenticated mail submission:

587 with STARTTLS — Submission, RFC 6409. The connection starts plaintext and upgrades.
465 with implicit TLS — SMTPS. The connection is TLS from the first byte.

Port 465 was deprecated in 1998, then re-blessed in 2018 (RFC 8314). The current best-practice answer is "support both, prefer 465." Why? Because 465 isn't strippable. There's no plaintext phase to strip. STARTTLS works, but you have to enforce it; SMTPS makes the enforcement implicit.

For your config: enable both, require auth on both, require TLS on both. Don't worry about clients — every modern mail client supports both ports.

SMTP smuggling, byte by byte

In late 2023, SEC Consult disclosed SMTP smuggling (CVE-2023-51764 and a flock of related CVEs). The attack works because different SMTP implementations disagree on what bytes terminate the DATA section.

The standard says the message terminates on

\r\n.\r\n

(CRLF dot CRLF). But some servers — both senders and receivers — historically accepted variants like

\n.\n

(LF dot LF) as message terminators, in the name of being "lenient with what they accept." The result: a sender can write a message body that looks like a single message to a strict parser, but contains an embedded

\n.\n

followed by a fully-formed second SMTP message that a lenient parser will treat as a separate, unauthenticated relay.

Stripped down to bytes:

DATA
First message body
\n.\n            <-- looks like end-of-data to lenient parser
MAIL FROM:<spoofed@bigcorp.com>
RCPT TO:<victim@target.com>
DATA
Second message — appears to come from inside the trusted server
.
\r\n.\r\n        <-- real end-of-data per the strict parser

When the outbound MTA (the lenient one) hands the buffer to its peer, the peer's parser may split it into two messages, the second of which inherits the SMTP session's authenticated state. Result: an attacker on the outside can inject a message that appears to come from inside the victim's mail infrastructure.

Postfix's response was to add

smtpd_forbid_bare_newline = yes

(now the default in current versions), which makes Postfix strict about CRLF and reject bare-LF terminators. Patch your server, set the option explicitly, and verify with a packet capture.

IMAP: tagged commands and long sessions

IMAP is also text, also CRLF-terminated, but it has structure SMTP doesn't. Every client command is prefixed with a tag (a short string the client picks) so responses can be correlated to commands across pipelined operations. A typical session:

S: * OK [CAPABILITY IMAP4rev1 ...] Dovecot ready.
C: a001 LOGIN me@example.com mypassword
S: a001 OK Logged in
C: a002 SELECT INBOX
S: * 12 EXISTS
S: * 0 RECENT
S: * OK [UNSEEN 7] First unseen.
S: * OK [UIDVALIDITY 1234567890]
S: * OK [UIDNEXT 13]
S: * FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
S: a002 OK [READ-WRITE] Select completed.
C: a003 FETCH 1 (BODY[HEADER])
S: * 1 FETCH (BODY[HEADER] {537}
S: From: ...
S: To: ...
S: Subject: ...
S: )
S: a003 OK Fetch completed.
C: a004 IDLE
S: + idling
[connection sits open, server pushes EXISTS responses when new mail arrives]
C: DONE
S: a004 OK Idle completed.
C: a005 LOGOUT

Three things to notice:

The

{537}

literal. When the server is about to send a chunk of binary-ish data (a message body, headers with arbitrary bytes), it announces the byte count first —

{537}

means "the next 537 bytes are payload, count them, don't look for a CRLF terminator." This is how IMAP handles 8-bit data without escaping.

IDLE keeps connections open. Modern mail clients use

IDLE

to receive push notifications when new mail arrives, instead of polling. A typical mailbox has connections open all the time. This is why the post-auth IMAP attack surface matters even when you're not actively doing anything — the session is persistent, the parser is running.

The session is stateful in non-trivial ways.

SELECT

chooses a mailbox; subsequent operations are scoped to it.

UIDVALIDITY

UIDNEXT

EXISTS

— clients track these to detect mailbox renames or rebuilds. State complexity is one of the reasons IMAP servers have a longer history of bugs than POP3 servers.

POP3: brutally simple, surprisingly alive

POP3 (RFC 1939) is the protocol for mail clients that just want to download messages and stop talking. Five commands cover most of the protocol:

S: +OK Dovecot ready.
C: USER me@example.com
S: +OK
C: PASS mypassword
S: +OK Logged in.
C: STAT
S: +OK 12 4567
C: LIST
S: +OK 12 messages:
S: 1 350
S: 2 421
S: ...
S: .
C: RETR 1
S: +OK 350 octets
S: From: ...
S: ...
S: .
C: DELE 1
S: +OK Marked to be deleted.
C: QUIT
S: +OK Logging out, messages deleted.

POP3 has the same dot-on-a-line message terminator as SMTP, the same dot-stuffing rules, and a much smaller state machine. It's not dead — many corporate setups still use POP3 for archival mailboxes — but you mostly see it from old Outlook clients and embedded devices.

The Wireshark lab

This is the whole point of the post: you can see all of this on your own server. On the server:

bash

# Capture all mail-related traffic to a pcap
sudo tcpdump -i any -w /tmp/mail.pcap \
    'port 25 or port 110 or port 143 or port 465 or port 587 or port 993 or port 995'

Then send some traffic:

bash

# SMTP test
swaks --to me@example.com --from sender@external.test --server localhost:25

# IMAP test (will be encrypted on 993, useful for comparison)
openssl s_client -connect localhost:993 -crlf <<EOF
a1 LOGIN me@example.com mypassword
a1 LIST "" "*"
a1 LOGOUT
EOF

# POP3 over plaintext (don't use this against a server you don't own)
nc localhost 110 <<EOF
USER me@example.com
PASS mypassword
STAT
QUIT
EOF

Stop the capture, open it in Wireshark, and sort by stream. Right-click any frame → "Follow → TCP Stream" gives you the entire conversation as readable text. For TLS-encrypted streams (993, 465, 587 after STARTTLS), you'll see only the handshake and ciphertext unless you've configured Wireshark to use the server's session keys (which is possible but a separate post).

For a clean teaching capture, run two captures: one with TLS disabled on a test box to see the protocol clearly, one with TLS to see how STARTTLS upgrades a session midway through. The diff between the two is what most textbook diagrams skip.

What I'd Tell My Past Self

The first time I read SMTP at the byte level, the thing that stuck wasn't any single command — it was the realisation that all these protocols are old, simple, and trust-based by design. They were built when the network was small and everyone was vaguely accountable. Every security feature you see — STARTTLS, SPF, DKIM, DMARC, DANE, MTA-STS — is a patch bolted onto a protocol that originally just trusted whoever was speaking it.

Once you understand that, the misconfigurations stop being mysterious. They're what happens when the patches don't all line up. Post 5 covers how mail sits on disk after these protocols have done their work, and Post 6 walks through the CVEs that the bolted-on security keeps almost-but-not-quite preventing.

If you take one thing from this post: open Wireshark on your own mail server at least once. Stare at it for an hour. Whatever else you do with email security, that hour will pay back its cost ten times over.

SMTP, IMAP, and POP3 at the Byte Level

SMTP, IMAP, and POP3 at the Byte Level

SMTP: a thirty-year-old chatty protocol

STARTTLS: the upgrade handshake

Submission vs SMTPS: the religious war

SMTP smuggling, byte by byte

IMAP: tagged commands and long sessions

POP3: brutally simple, surprisingly alive

The Wireshark lab

What I'd Tell My Past Self

Discussion

Share your thoughts