This document is a journey; some parts are well-traveled, and in other areas you will find yourself almost alone. The best advice I can give you is to grab a large, cozy mug of coffee or hot chocolate, get into a comfortable chair, and absorb the contents before venturing out into the sometimes dangerous world of network hacking.
What is netfilter?
netfilter is a framework for packet mangling, outside the normal Berkeley socket interface. It has four parts. Firstly, each protocol defines “hooks” (IPv4 defines 5) which are well-defined points in a packet's traversal of that protocol stack. At each of these points, the protocol will call the netfilter framework with the packet and the hook number.
Secondly, parts of the kernel can register to listen to the different hooks for each protocol. So when a packet is passed to the netfilter framework, it checks to see if anyone has registered for that protocol and hook; if so, they each get a chance to examine (and possibly alter) the packet in order, then discard the packet (NF_DROP), allow it to pass (NF_ACCEPT), tell netfilter to forget about the packet (NF_STOLEN), or ask netfilter to queue the packet for userspace (NF_QUEUE).
The third part is that packets that have been queued are collected (by the ip_queue driver) for sending to userspace; these packets are handled asynchronously.
The final part consists of cool comments in the code and documentation. This is instrumental for any experimental project. The netfilter motto is (stolen shamelessly from Cort Dougan):
``So... how is this better than KDE?''
(This motto narrowly edged out `Whip me, beat me, make me use ipchains').
In addition to this raw framework, various modules have been written which provide functionality similar to previous (pre-netfilter) kernels, in particular, an extensible NAT system, and an extensible packet filtering system (iptables).
What's wrong with what we had in 2.0 and 2.2?
No infrastructure established for passing packet to userspace:
Transparent proxying is a crock:
Creating packet filter rules independent of interface addresses is not possible:
Masquerading is tacked onto packet filtering:
Interactions between packet filtering and masquerading make firewalling complex:
TOS manipulation, redirect, ICMP unreachable and mark (which can effect port forwarding, routing, and QoS) are tacked onto packet filter code as well.
ipchains code is neither modular, nor extensible (eg. MAC address filtering, options filtering, etc).
Lack of sufficient infrastructure has led to a profusion of different techniques:
Incompatibility between CONFIG_NET_FASTROUTE and packet filtering:
Inspection of packets dropped due to routing protection (eg. Source Address Verification) not possible.
No way of atomically reading counters on packet filter rules.
CONFIG_IP_ALWAYS_DEFRAG is a compile-time option, making life difficult for distributions who want one general-purpose kernel.
Who are you?
I'm the only one foolish enough to do this. I see many of the problems that people have with the current system, as well as getting exposure to what they are trying to do.
Why does it crash?
Woah! You should have seen it last week!
Because I'm not as great a programmer as we might all wish, and I certainly haven't tested all scenarios, because of lack of time, equipment and/or inspiration. I do have a testsuite, which I encourage you to contribute to.
Where Can I Get The Latest?
There is a CVS server on netfilter.org which contains the latest HOWTOs, userspace tools and testsuite. For casual browsing, you can use the Web Interface ( http://cvs.netfilter.org ) .
To grab the latest sources, you can do the following:
1. Log in to the netfilter CVS server anonymously:
cvs -d :pserver:firstname.lastname@example.org:/cvspublic login
2. When it asks you for a password type `cvs'. 3. Check out the code using:
# cvs -d :pserver:email@example.com:/cvspublic co netfilter/userspace
4. To update to the latest version, use
cvs update -d -P
Now we have an example of netfilter for IPv4, you can see when each hook is activated. This is the essence of netfilter.
Kernel modules can register to listen at any of these hooks. A module that registers a function must specify the priority of the function within the hook; then when that netfilter hook is called from the core networking code, each module registered at that point is called in the order of priorites, and is free to manipulate the packet. The module can then tell netfilter to do one of five things:
1. NF_ACCEPT: continue traversal as normal. 2. NF_DROP: drop the packet; don't continue traversal. 3. NF_STOLEN: I've taken over the packet; don't continue traversal. 4. NF_QUEUE: queue the packet (usually for userspace handling). 5. NF_REPEAT: call this hook again.
The other parts of netfilter (handling queued packets, cool comments) will be covered in the kernel section later.
Upon this foundation, we can build fairly complex packet manipulations, as shown in the next two sections.
Packet Selection: IP Tables
This table, `filter', should never alter packets: only filter them.
One of the advantages of iptables filter over ipchains is that it is small and fast, and it hooks into netfilter at the NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT points. This means that for any given packet, there is one (and only one) possible place to filter it. This makes things much simpler for users than ipchains was. Also, the fact that the netfilter framework provides both the input and output interfaces for the NF_IP_FORWARD hook means that many kinds of filtering are far simpler.
Note: I have ported the kernel portions of both ipchains and ipfwadm as modules on top of netfilter, enabling the use of the old ipfwadm and ipchains userspace tools without requiring an upgrade.
This is the realm of the `nat' table, which is fed packets from two netfilter hooks: for non-local packets, the NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks are perfect for destination and source alterations respectively. If CONFIG_IP_NF_NAT_LOCAL is defined, the hooks NF_IP_LOCAL_OUT and NF_IP_LOCAL_IN are used for altering the destination of local packets.
This table is slightly different from the `filter' table, in that only the first packet of a new connection will traverse the table: the result of this traversal is then applied to all future packets in the same connection.
Masquerading, Port Forwarding, Transparent Proxying
I divide NAT into Source NAT (where the first packet has its source altered), and Destination NAT (the first packet has its destination altered).
Masquerading is a special form of Source NAT: port forwarding and transparent proxying are special forms of Destination NAT. These are now all done using the NAT framework, rather than being independent entities.
The packet mangling table (the `mangle' table) is used for actual changing of packet information. Example applications are the TOS and TCPMSS targets. The mangle table hooks into all five netfilter hooks. (please note this changed with kernel 2.4.18. Previous kernels didn't have mangle attached to all hooks)
Connection tracking is fundamental to NAT, but it is implemented as a separate module; this allows an extension to the packet filtering code to simply and cleanly use connection tracking (the `state' module).
The new flexibility provides both the opportunity to do really funky things, but for people to write enhancements or complete replacements that can be mixed and matched.