User Tools

Site Tools


netcons

Netconsole

Introduction

Netconsole is a kernel device to redirect normal kernel console messages through layer 2 (ethernet) packets. The intent is to provide better status, and diagnostic feedback while avoiding the expense of a serial concentrator system.

Netconsole consists of several pieces:

 1. Netconsole module
 2. Modified Ethernet driver
 3. etherconsole_catcher
 4. patched kernel
 5. etherconsole_susrq command line app. 
netconsole module

The netconsole module is a kernel driver that presents itself to the kernel as a write only console device. In addition to the internal kernel API, it also supports character device 240,0 (/dev/netconsole) for use by scripts and optionally, syslogd and klogd.

Since a primary benefit of console access is the ability to capture oops reports and other diagnostices when the system is in an unhealthy state, the normal network stack is bypassed. Netconsole sends it's packets directly to a modified ethernet driver. Even in a crashing system, a modern DMA driven ethernet card is as or more likely to successfully transmit the information than an interrupt driven or polled serial driver.

Loading

Before loading netconsole, a suitable ethernet driver must be either compiled into the kernel or loaded. Then:

modprobe netconsole dev=<devname>

Where devnamne is the ethernet device (such as eth0) which is to transmit the console packets.

To be most effective, console=ether should be passed to the kernel on it's command line. This will cause the kernel to use the netconsole device for output in a crash situation.

In order for netconsole messages to be sent, the interface must be up, though it need not have an IP address assigned. ifconfig <interface> up will suffice.

Modified ethernet driver

In order for netconsole to work, it is necessary to bypass the (possibly crashing) network stack as well as the possibly long packet queue. To that end, netconsole needs to poll the driver for an available transmit slot. The modification for netconsole consists of adding a poll method that simulates an interrupt from the card in order to clear a tx slot as soon as possible (or at all since netconsole may be blocking itnerrupt servicing).

Currently, 3c59x, eepro100 and e1000 have this modification.

etherconsole_catcher

The other side of netconsole is etherconsole_catcher, a daemon that runs on the master listening for packets with the netconsole protocol (0x0777).

etherconsole_catcher [-d|–debug] [-l|–logdir <directory to write log files into>] [–iface <network interface to listen on (default is eth0)>]

-d|–debug Stay in foreground and output debugging messages.

-l|–logdir Write all log files into specified directory (rather than PWD).

iface Listen on the specified network interface for netconsole messages.

OPn startup, etherconsole_catcher will read /etc/beowulf/bootMAC (if it exists) or /etc/beowulf/config to load a list of MAC addresses for the boot net and the corresponding node number. When an incoming packet with a source MAC address in that list is processed, it's contents along with a date stamp will be appended to <logdir>/node.<nude number>. Should there be no match, the file will be named as the source MAC address in hex notation.

It is worth noting that the kernel buffers all console output until a console device becomes available for it, so the netconsole log files will show all messages from the start of the boot process. Since those messages are backlog, IF THE BOOT FAILS PRIOR TO LOADING NETCONSOLE, THERE WILL BE NO OUTPUT AT ALL. This is an unfortunate but unavoidable limitation of netconsole.

Patched kernel

To support netconsole, two minor modifications are made to the kernel.

  • the dev_poll device method is added to net devices. Network drivers that do not support this feature will leave that method as NULL, so netconsole must check this for support.
  • A small modification is made to handle incoming packets for protocol 0x778 immediatly rather than queueing through the network stack. This protocol carries SysRq commands over the local network. These packets bypass the network queues and stack and take action during the inetrrupt the packet was recieved in. This is done to maximise the chances of having a useful effect if the system is locked up but still responding to interrupts (approximatly as likely as responding to Alt-SysRq from the keyboard). These 'magic packets' work even before the netconsole driver loads, and do not require a modification to the ethernet driver. To eneble this feature, the kernel should be compiled with CONFIG_NET_SYSRQ (Net/'Support for net SysRq packets'.
  • WARNING : Currently, this feature makes no attempt to be secure whatsoever! Anyone able to access a raw socket can send the magic packet to hard reset the machine! This is acceptable for a compute node in a cluster environment on a private network, but may cause problems in a less controlled LAN environment. Though the protocol is not routable over the internet, it will get through a switch or bridge. If you are the least bit concerned that someone attached to the LAN might misuse this, do not enable the feature. NOTE: since these packets are processed before entering the network stack, firewall rules on the machine WILL NOT block this packet.
  • This feature is controlled by /proc/sys/net/core/sysrq. Setting this sysctl to “0” will disable acting on net sysrq packets. The default value is “1” (net sysrq enabled).
  • In order to recieve the magic packet, the interface must be up, though it need not have an ip address assigned to it.
etherconsole_sysrq

etherconsole_sysrq [–iface <interface>] (-S|–node) <node number>

iface Send the magic packet on the specified interface

-S|–node Send the magic packet to the specified node.

etherconsole_sysrq uses the same configuration files (/etc/beowulf/bootMAC or /etc/beowulf/config) as etherconsole_catcher to determine a MAC address from the supplied node number.

netcons.txt · Last modified: 2010/04/15 21:18 (external edit)