User Tools

Site Tools


handbook:handbook:netfilter

Introduction

Hi guys.

This document is a journey; some parts are well-traveled, and in other areas you will find yourself almost alone. The best advice I can give you is to grab a large, cozy mug of coffee or hot chocolate, get into a comfortable chair, and absorb the contents before venturing out into the sometimes dangerous world of network hacking.

What is netfilter?

netfilter is a framework for packet mangling, outside the normal Berkeley socket interface. It has four parts. Firstly, each protocol defines “hooks” (IPv4 defines 5) which are well-defined points in a packet's traversal of that protocol stack. At each of these points, the protocol will call the netfilter framework with the packet and the hook number.

Secondly, parts of the kernel can register to listen to the different hooks for each protocol. So when a packet is passed to the netfilter framework, it checks to see if anyone has registered for that protocol and hook; if so, they each get a chance to examine (and possibly alter) the packet in order, then discard the packet (NF_DROP), allow it to pass (NF_ACCEPT), tell netfilter to forget about the packet (NF_STOLEN), or ask netfilter to queue the packet for userspace (NF_QUEUE).

The third part is that packets that have been queued are collected (by the ip_queue driver) for sending to userspace; these packets are handled asynchronously.

The final part consists of cool comments in the code and documentation. This is instrumental for any experimental project. The netfilter motto is (stolen shamelessly from Cort Dougan):

      ``So... how is this better than KDE?''

(This motto narrowly edged out `Whip me, beat me, make me use ipchains').

In addition to this raw framework, various modules have been written which provide functionality similar to previous (pre-netfilter) kernels, in particular, an extensible NAT system, and an extensible packet filtering system (iptables).

What's wrong with what we had in 2.0 and 2.2?

No infrastructure established for passing packet to userspace:

  • Kernel coding is hard
  • Kernel coding must be done in C/C++
  • Dynamic filtering policies do not belong in kernel
  • 2.2 introduced copying packets to userspace via netlink, but reinjecting packets is slow, and subject to `sanity' checks. For example, reinjecting packet claiming to come from an existing interface is not possible.

Transparent proxying is a crock:

  • We look up every packet to see if there is a socket bound to that address
  • Root is allowed to bind to foreign addresses
  • Can't redirect locally-generated packets
  • REDIRECT doesn't handle UDP replies: redirecting UDP named packets to 1153 doesn't work because some clients don't like replies coming from anything other than port 53.
  • REDIRECT doesn't coordinate with tcp/udp port allocation: a user may get a port shadowed by a REDIRECT rule.
  • Has been broken at least twice during 2.1 series.
  • Code is extremely intrusive. Consider the stats on the number of #ifdef CONFIG_IP_TRANSPARENT_PROXY in 2.2.1: 34 occurrences in 11 files. Compare this with CONFIG_IP_FIREWALL, which has 10 occurrences in 5 files.

Creating packet filter rules independent of interface addresses is not possible:

  • Must know local interface addresses to distinguish locally-generated or locally-terminating packets from through packets.
  • Even that is not enough in cases of redirection or masquerading.
  • Forward chain only has information on outgoing interface, meaning you have to figure where a packet came from using knowledge of the network topography.

Masquerading is tacked onto packet filtering:

 Interactions between packet filtering and masquerading make firewalling complex: 
  • At input filtering, reply packets appear to be destined for box itself
  • At forward filtering, demasqueraded packets are not seen at all
  • At output filtering, packets appear to come from local box

TOS manipulation, redirect, ICMP unreachable and mark (which can effect port forwarding, routing, and QoS) are tacked onto packet filter code as well.

ipchains code is neither modular, nor extensible (eg. MAC address filtering, options filtering, etc).

Lack of sufficient infrastructure has led to a profusion of different techniques:

  • Masquerading, plus per-protocol modules
  • Fast static NAT by routing code (doesn't have per-protocol handling)
  • Port forwarding, redirect, auto forwarding
  • The Linux NAT and Virtual Server Projects.

Incompatibility between CONFIG_NET_FASTROUTE and packet filtering:

  • Forwarded packets traverse three chains anyway
  • No way to tell if these chains can be bypassed

Inspection of packets dropped due to routing protection (eg. Source Address Verification) not possible.

No way of atomically reading counters on packet filter rules.

CONFIG_IP_ALWAYS_DEFRAG is a compile-time option, making life difficult for distributions who want one general-purpose kernel.

Who are you?

I'm the only one foolish enough to do this. I see many of the problems that people have with the current system, as well as getting exposure to what they are trying to do.

Why does it crash?

Woah! You should have seen it last week!

Because I'm not as great a programmer as we might all wish, and I certainly haven't tested all scenarios, because of lack of time, equipment and/or inspiration. I do have a testsuite, which I encourage you to contribute to.

Where Can I Get The Latest?

There is a CVS server on netfilter.org which contains the latest HOWTOs, userspace tools and testsuite. For casual browsing, you can use the Web Interface ( http://cvs.netfilter.org ) .

To grab the latest sources, you can do the following:

 1. Log in to the netfilter CVS server anonymously:

cvs -d :pserver:cvs@pserver.netfilter.org:/cvspublic login

 2. When it asks you for a password type `cvs'.
 3. Check out the code using:

# cvs -d :pserver:cvs@pserver.netfilter.org:/cvspublic co netfilter/userspace

 4. To update to the latest version, use

cvs update -d -P

Netfilter Base

Now we have an example of netfilter for IPv4, you can see when each hook is activated. This is the essence of netfilter.

Kernel modules can register to listen at any of these hooks. A module that registers a function must specify the priority of the function within the hook; then when that netfilter hook is called from the core networking code, each module registered at that point is called in the order of priorites, and is free to manipulate the packet. The module can then tell netfilter to do one of five things:

 1. NF_ACCEPT: continue traversal as normal.
 2. NF_DROP: drop the packet; don't continue traversal.
 3. NF_STOLEN: I've taken over the packet; don't continue traversal.
 4. NF_QUEUE: queue the packet (usually for userspace handling).
 5. NF_REPEAT: call this hook again.

The other parts of netfilter (handling queued packets, cool comments) will be covered in the kernel section later.

Upon this foundation, we can build fairly complex packet manipulations, as shown in the next two sections.

Packet Selection: IP Tables

Packet Filtering

This table, `filter', should never alter packets: only filter them.

One of the advantages of iptables filter over ipchains is that it is small and fast, and it hooks into netfilter at the NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT points. This means that for any given packet, there is one (and only one) possible place to filter it. This makes things much simpler for users than ipchains was. Also, the fact that the netfilter framework provides both the input and output interfaces for the NF_IP_FORWARD hook means that many kinds of filtering are far simpler.

Note: I have ported the kernel portions of both ipchains and ipfwadm as modules on top of netfilter, enabling the use of the old ipfwadm and ipchains userspace tools without requiring an upgrade.

NAT

This is the realm of the `nat' table, which is fed packets from two netfilter hooks: for non-local packets, the NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks are perfect for destination and source alterations respectively. If CONFIG_IP_NF_NAT_LOCAL is defined, the hooks NF_IP_LOCAL_OUT and NF_IP_LOCAL_IN are used for altering the destination of local packets.

This table is slightly different from the `filter' table, in that only the first packet of a new connection will traverse the table: the result of this traversal is then applied to all future packets in the same connection.

Masquerading, Port Forwarding, Transparent Proxying

I divide NAT into Source NAT (where the first packet has its source altered), and Destination NAT (the first packet has its destination altered).

Masquerading is a special form of Source NAT: port forwarding and transparent proxying are special forms of Destination NAT. These are now all done using the NAT framework, rather than being independent entities.

Packet Mangling

The packet mangling table (the `mangle' table) is used for actual changing of packet information. Example applications are the TOS and TCPMSS targets. The mangle table hooks into all five netfilter hooks. (please note this changed with kernel 2.4.18. Previous kernels didn't have mangle attached to all hooks)

Connection Tracking

Connection tracking is fundamental to NAT, but it is implemented as a separate module; this allows an extension to the packet filtering code to simply and cleanly use connection tracking (the `state' module).

Other Additions

The new flexibility provides both the opportunity to do really funky things, but for people to write enhancements or complete replacements that can be mixed and matched.

handbook/handbook/netfilter.txt · Last modified: 2010/04/15 21:18 (external edit)