[Japanese|English]

Path MTU Discovery Black Hole Problem and Solution for PPPoE Router

Here is an immediate solution

What is this page for?

This page describes a common problem called "PMTUD Black Hole", which appears as:

Some workaround is also discussed.

Before reading further, it's better to see a chart for the network configuration under discussion.

What is PMTUD Black Hole problem?

MTU (Maximum Transmission Unit) is a network parameter determined by the physical medium type, which represents the maximum size of a packet to be transmitted.

PPPoE provided by Tokyo Metallic Communications' Single Plan uses MTU of 1454 bytes. NTT's FLET'S ADSL also uses the same value. Usual Ethernet LAN or normal PPP over analog modem connection and ISDN, however, uses 1500 bytes. This 46 byte gap is critical.

On a machine talking PPPoE directly has no problem with any MTU value, because the TCP/IP protocol knows exact MTU value used by the PPPoE. However, when PPPoE client is running on a router as in our case, a machine on the LAN doesn't know the Internet connection is through PPPoE and only recognizes the MTU of the LAN (1500.) The situation is same for the server you are connecting to.

Modern TCP/IP implementation uses a technique called Path MTU Discovery (RFC 1191) to solve the case. Unfortunately, there are chances that the PMTUD triggers another problem, known as PMTUD Black Hole (RFC 2923), which prevents connection to some particular Internet hosts under some condition. It is caused when ICMP messages carrying "Destination Unreachable -- Fragmentation Needed", which are essential to PMTUD algorithm, are somehow blocked and don't get to the source host of IP datagram. The conditions causing it include:

If it happens, no data can be exchanged with that particular server, although TCP connections can be established. So the case is called Black Hole.

[NOTE] In fact, a little data may be exchanged. (Discussed later.) This makes trouble shooting harder.

[NOTE] "The router terminating PPPoE" refers to that in the far end of the ADSL line owned by your provider and not to yours. "IP-filtering firewall" and "NAT router" refers to those located near the target server.

How can we determine PMTUD Black Hole?

PMTUD Black Hole characteristics

It is not an easy job to determine one particular trouble is caused by PMTUD Black Hole. Knowing characteristics of PMTUD Black Hole may help you identify the problem.

The point is that a connection that is apparently working suddenly stops at the moment a big data is to be sent from the server. This occurs per-connection basis. Even after a connection stops working, other connection to the same server can be made and existing connections also work unless large packet are sent.

Applications' behaviour

Typically, applications behave as follows when PMTUD Black Hole problem occurred. (Various factors affect exact behaviour; Especially the timing a connection stops working varies depending on server load.)

HTTP
It connects to the server, but no data is received.
FTP
The control connection to the server establishes. You can login, can cd to other directories, and can switch between modes (i.e., commands such as binary, prompt, glob). You can also use dir (ls) command if the directory includes small number of files. You can even put files. If you try to get a file, it just hangs, unless the file is small.
TELNET
You can connect to the host and login. However, if the machine's login banner (such as /etc/motd) is long, the TELNET session hangs just after successful login (or after the first one or two lines of the banner are shown.) When the login banner is sufficiently short, you can successfully see the shell prompt and can use several commands, but if you try commands, such as ls -l or more, which send a lot of characters to the terminal, the TELNET session hangs.
SMTP (Send)
No problem at all.
POP3
You can login to the server. You can see the list of received mails, if the mailbox is near empty. You can also read short mails (up to about 10 lines.) If you try to get the list of mails on large mailbox or try to read a long mail, it hangs.
Real Player or Windows Media Player
When TCP or HTTP connection is used, it first works fine, but suddenly stops. It says something like "Network is busy" or "Rebuffering", but it never restarts. If you are out of luck, the player may hang during initial buffering. When UDP is used, it just works fine.

PPPoE specific Black Hole characteristics

PMTUD Black Hole problem caused by PPPoE connection in our particular network configuration has another significant characteristics; Applications running on the router machine (i.e., the FreeBSD box talking PPPoE) are unaffected. For example, even if FTP from a machine on the LAN to a particular server doesn't work, you can still FTP to the same server from the router.

Known workarounds

Two separate workarounds are currently known; One is to upgrade the ppp command and use MSS fixup feature, and the other is to adjust default MTU values on the local machines.

Using MSS fixup feature of new ppp

TCP has an option called Maximum Segment Size (MSS), which can be used to tell the remote machine preferred MTU value. (In fact, MSS is not exactly an option to tell MTU, but I don't want to go into details of TCP specification here... Anyway, it is something similar to that.) The latest ppp command of FreeBSD has a new feature to rewrite the MSS value (like NAT) to tell the remote host the MTU value of the PPPoE link.

The feature is added to the ppp on Dec. 2000, so FreeBSD 4.2 RELEASE was distributed without it. Soon-to-come 4.3 RELEASE will have it, of course. You can, however, use the MSS fixup feature on 4.2 RELEASE or previous versions by replacing ppp command to the latest.

As of this writing (Mar. 2001), the new ppp binary supporting MSS fixup is available as http://people.freebsd.org/~brian/ppp-010204-4.2-STABLE.bin.tar.gz. On machines with FreeBSD 4.1 RELEASE (and slightly older 4-STABLE?) or later, the binary should work.

In this new ppp command, the MSS fixup feature (enable tcpmssfixup) is active as default, and your problem should disappear as soon as you replace the ppp command. (If it didn't solve, you now know your problem was not the PMTUD Black Hole...)

Adjusting the MTU value on local machines

If you don't want to replace the ppp, you can follow FreeBSD FAQ to adjust the MTU values of the local machines on the LAN. It should be noted that you have to adjust all machines on the LAN. I think the idea "All guiltiness is of TELCO" is incorrect, but the explained workaround is accurate. For use with Tokyo Metallic Communications' Single Plan or NTT's FLET'S ADSL, you don't need any extra; you can just use 1454 as MTU.