Path MTU Discovery and Filtering ICMP

Marc Slemko marcs@znep.com
Created: Thursday, January 18 1998
Last Modified: Wednesday, October 4 2006

This document explains the details of how path MTU discovery (PMTU-D) combined with filtering ICMP messages can result in connectivity problems. If you are familiar with the terms discussed

Let's start by defining what we are talking about

MTU
The maximum transmission unit is a link layer restriction on the maximum number of bytes of data in a single transmission (ie. frame, cell, packet, depending on the terminology). The below table shows some typical values for MTUs, taken from RFC-1191:

MTU	Where Commonly Used
65535	Hyperchannel
17914	16 Mbit/sec token ring
8166	Token Bus (IEEE 802.4)
4464	4 Mbit/sec token ring (IEEE 802.5)
1500	Ethernet
1500	PPP (typical; can vary widely)
576	X.25 Networks

Path MTU
The smallest MTU of any link on the current path between two hosts. This may change over time since the route between two hosts, especially on the Internet, may change over time. It is not necessarily symmetric and can even vary for different types of traffic from the same host.

Fragmentation
When a packet is too large to be sent across a link as a single unit, a router can fragment the packet. This means that it splits it into multiple parts which contain enough information for the receiver to glue them together again. Note that this is not done on a hop-by-hop basis, but once fragmented a packet will not be put back together until it reaches its destination. Fragmentation is undesirable for numerous reasons, including:

If any one fragment from a packet is dropped, the entire packet needs to be retransmitted. This is a very significant problem.
It imposes extra processing load on the routers that have to split the packets.
In some configuration, simpler firewalls will block all fragments because they don't contain the header information for a higher layer protocol (eg. TCP) needed for filtering.

DF (Don't Fragment) bit
This is a bit in the IP header that can be set to indicate that the packet should not be fragmented by routers, but instead an ICMP "can't fragment" error is returned sent to the sender and the packet is dropped.

ICMP Can't Fragment Error
This error (type 3 (destination unreachable), code 4 (fragmentation needed but don't-fragment bit set)) is returned by a router when it receives a packet that is too large for it to forward and the DF bit is set. The packet is dropped and the ICMP error is sent back to the origin host. Normally, this tells the origin host that it needs to reduce the size of its packets if it wants to get through. Recent systems also include the MTU of the next hop in the ICMP message so the source knows how big its packets can be. Note that this error is only sent if the DF bit is set; otherwise, packets are just fragmented and passed through.

MSS
The MSS is the maximum segment size. It can be announced during the establishment of a TCP connection to indicate to the other end the largest amount of data in one packet that should be sent by the remote system. Normally the packet generated will be 40 bytes larger than this; 20 bytes for the IP header and 20 for the TCP header. Most systems announce a MSS that is determined from the MTU on the interface that the traffic to the remote system passes out from the system through.

Path MTU Discovery (PMTU-D)
Now you know that Path MTUs vary. You know that fragmentation is bad. The solution? Well, one solution is Path MTU Discovery. The idea behind it is to send packets that are as large as possible while still avoiding fragmentation. A host does this by starting by sending packets that have a maximum size of the lesser of the local MTU or the MSS announced by the remote system. These packets are sent with the DF bit set. If there is some MTU between the two hosts which is too small to pass the packet successfully, then an ICMP can't fragment error will be sent back to the source. It will then know to lower the size; if the ICMP message includes the next hop MTU, it can pick the correct size for that link immediately, otherwise it has to guess.

The exact process that systems go through is somewhat more complicated to account for special circumstances. For full details, see RFC-1191.

A good indication of if a system is trying to do PMTU-D is to watch the packets it is sending with something like tcpdump or snoop and see if they have the DF bit set; if so, it is most likely trying to do PMTU-D.

Now, to the problem with ICMP filtering and PMTU-D

Now we get to the problem. Many network administrators have decided to filter ICMP at a router or firewall. There are valid (and many invalid) reasons for doing this, however it can cause problems. ICMP is an integral part of the Internet and can not be filtered without due consideration for the effects.

In this case, if the ICMP can't fragment errors can not get back to the source host due to a filter, the host will never know that the packets it is sending are too large. This means it will keep trying to send the same large packet, and it will keep being dropped–silently dropped from the view of any system on the other side of the filter. While a small handful of systems that implement PMTU-D also implement a way to detect such situations, most don't and even for those that do it has a negative impact on performance and the network.

If this is happening, typical symptoms include the ability for small packets (eg. request a very small web page) to get through, but larger ones (eg. a large web page) will simply hang. This situation can be confusing to the novice administrator because they obviously have some connectivity to the host, but it just stops working for no obvious reason on certain transfers.

There is one solution, and several workarounds, for this problem. They include:

Fix your filters! The real problem here is filtering ICMP messages without understanding the consequences. Many packet filters will allow you to setup filters to only allow certain types of ICMP messages through. If you reconfigure them to let ICMP can't fragment (type 3, code 4) messages through, the problem should disappear. If the filter is somewhere between you and the other end, contact the administrator of that machine and try to convince them to fix the problem.
Reduce the MTU on the machines at one end or the other. This is a workaround and should not be done unless necessary. If you reduce the MTU on the system trying to do path MTU discovery to a point where it is less than or equal to the former path MTU, it will no longer try sending packets large enough to cause problems. Similarly, if you change the MTU on the system on the other end, it will advertise a lower MSS so the sending system will only send packets with data that fits into that MSS.
Disable PMTU-D; if you control access to the machine that is trying to do PMTU-D, and are unable to get the person administering the bogus filter to fix it, disabling PMTU-D will fix the problem for data sent by that machine. Data being received by the machine, however, can still run into the problem. With the size that HTTP requests are growing to, this could start to be a problem more and more; historically, HTTP requests have nearly always been small enough to fit through links with small MTUs in one packet. Disabling PMTU-D is simply a workaround, and should not generally be done unless necessary or you know what you are doing.

So how can using RFC 1918 addresses for router links cause problems?

On many routers, a separate IP address in the same subnet is required for each end of a point to point link. This can use address space if there are a large number of such links. Since the actual address of the links doesn't appear to impact much, many people use RFC 1918 private address space for such links. The blocks included in this are:

10.0.0.0 - 10.255.255.255 (10/8 prefix)

172.16.0.0 - 172.31.255.255 (172.16/12 prefix)

192.168.0.0 - 192.168.255.255 (192.168/16 prefix)

If you are using such addresses, then ICMP messages (including "can't fragment" errors) will normally be generated using such addresses. Since many networks filter incoming traffic from such reserved addresses, the net result is the same as if all ICMP were being filtered and can cause the same problems.

Discussion

Real name:

E-Mail:

Enter your comment. Wiki syntax is allowed:

Нужно набрать RTFM на русском языке Please keep this field empty:

Subscribe to comments

RTFM.WIKI

Table of Contents

Path MTU Discovery and Filtering ICMP

Let's start by defining what we are talking about

Now, to the problem with ICMP filtering and PMTU-D

So how can using RFC 1918 addresses for router links cause problems?

Discussion