Back to all posts

Traffic Shaping with CAKE on NixOS

I wanted to make sure my 1 Gbit fiber connection stays responsive even under heavy load. The standard approach is traffic shaping with a smart queue management (SQM) algorithm. CAKE is the best option available today — here’s how I set it up on NixOS.

What’s Bufferbloat?

When you saturate your connection, packets queue up in buffers — at your router, your modem, your ISP. These buffers can hold hundreds of milliseconds worth of data. A video call or SSH session suddenly feels laggy because its packets are stuck behind a bulk download.

You can test your connection at bufferbloat.net. An “A” rating means latency stays low under load.

How CAKE Works

CAKE (Common Applications Kept Enhanced) is a qdisc — a kernel component that decides which packets to send next and which to drop. It combines several techniques:

Active Queue Management (AQM): Instead of letting buffers fill up completely, CAKE uses a modified BLUE algorithm to start dropping packets early when congestion builds. This signals TCP to slow down before latency spikes.

Flow isolation: CAKE hashes packets into 1024 queues based on source/destination. A single bulk transfer can’t starve other flows — each gets fair access to bandwidth. This happens automatically without configuration.

Deficit Round Robin: Queues are serviced in rotation, with each getting a “deficit” of bytes to send. Flows that sent less than their share carry credit forward. This ensures fairness even with variable packet sizes.

Overhead compensation: CAKE accounts for link-layer overhead (PPPoE headers, ATM cell padding) that the kernel doesn’t see. This prevents the shaper from thinking it has more bandwidth than actually available.

Why CAKE Over Other Qdiscs?

vs. fq_codel: The previous gold standard. CAKE builds on fq_codel but adds bandwidth shaping, overhead handling, and per-host fairness. fq_codel only does queue management, not rate limiting.

vs. HTB + fq_codel: HTB (Hierarchical Token Bucket) is the traditional way to shape bandwidth, often combined with fq_codel for AQM. This works, but requires manual class configuration and doesn’t handle overhead. CAKE replaces both in a single, simpler qdisc.

vs. SFQ: An older fair queuing algorithm without AQM. Doesn’t prevent bufferbloat, just distributes it fairly.

CAKE is essentially “HTB + fq_codel + overhead handling + host fairness” in one package, designed specifically for home router use cases.

Ingress Shaping with IFB

CAKE handles egress (upload) directly. For ingress (download), there’s a trick: you can’t shape traffic that’s already arrived. Instead, we redirect incoming packets through an IFB (Intermediate Functional Block) virtual interface and shape there.

This works because the shaping happens before packets reach their destination — we’re rate-limiting how fast the kernel processes them, which causes TCP to throttle the sender.

NixOS Configuration

{ config, lib, pkgs, ... }:

{
  boot.kernelModules = [ "sch_cake" "ifb" ];

  systemd.services.sqm-cake = {
    description = "SQM CAKE traffic shaping for PPPoE";
    after = [ "pppd-wan.service" ];
    requires = [ "pppd-wan.service" ];
    wantedBy = [ "multi-user.target" ];
    path = [ pkgs.iproute2 pkgs.kmod pkgs.ethtool ];

    serviceConfig = {
      Type = "oneshot";
      RemainAfterExit = true;

      ExecStart = pkgs.writeShellScript "sqm-cake-start" ''
        # Wait for ppp0 to be fully up
        for i in $(seq 1 30); do
          ip link show ppp0 up >/dev/null 2>&1 && break
          sleep 1
        done

        # Enable UDP GRO forwarding (helps Tailscale)
        ethtool -K ppp0 rx-udp-gro-forwarding on || true

        modprobe sch_cake

        # Egress: 580Mbit (leave headroom under 600Mbit uplink)
        tc qdisc replace dev ppp0 root cake bandwidth 580mbit \
          besteffort overhead 34 mpu 64 pppoe-ptm

        # Create IFB for ingress shaping
        modprobe ifb
        ip link add name ifb-ppp0 type ifb 2>/dev/null || true
        ip link set ifb-ppp0 up

        # Redirect ingress to IFB
        tc qdisc replace dev ppp0 handle ffff: ingress
        tc filter replace dev ppp0 parent ffff: protocol all \
          u32 match u32 0 0 action mirred egress redirect dev ifb-ppp0

        # Ingress: 940Mbit (under 1Gbit downlink)
        tc qdisc replace dev ifb-ppp0 root cake bandwidth 940mbit \
          besteffort wash ingress overhead 34 mpu 64 pppoe-ptm
      '';

      ExecStop = pkgs.writeShellScript "sqm-cake-stop" ''
        tc qdisc del dev ppp0 root 2>/dev/null || true
        tc qdisc del dev ppp0 ingress 2>/dev/null || true
        ip link del ifb-ppp0 2>/dev/null || true
      '';
    };
  };
}

Key Parameters Explained

bandwidth: Set this ~5-10% below your actual line rate. CAKE needs headroom to work — if your ISP’s buffer fills up first, you lose control.

overhead 34 mpu 64 pppoe-ptm: PPPoE over VDSL2 (PTM mode) adds 34 bytes of overhead per packet. The mpu 64 sets minimum packet size for overhead calculation. If you’re on different encapsulation (DOCSIS, ethernet), check the CAKE man page for correct values.

besteffort: Disables flow isolation and priority queuing — all traffic treated equally. Use diffserv4 if you want DSCP-based prioritization.

wash: Clears DSCP markings on ingress. Some ISPs set garbage values that mess with prioritization.

ingress: Tells CAKE this is ingress traffic (affects internal flow handling).

Verifying It Works

Check the qdisc is active:

tc -s qdisc show dev ppp0
qdisc cake 8001: root refcnt 2 bandwidth 580Mbit besteffort overhead 34 mpu 64 ptm
Sent 1284923847 bytes 1293847 pkt (dropped 42, overlimits 8234 requeues 0)
backlog 0b 0p requeues 0
...
tc -s qdisc show dev ifb-ppp0
qdisc cake 8002: root refcnt 2 bandwidth 940Mbit besteffort wash ingress overhead 34 mpu 64 ptm
Sent 9283746123 bytes 7293182 pkt (dropped 127, overlimits 29341 requeues 0)
backlog 0b 0p requeues 0
...

The key indicators: packet counts incrementing, backlog 0b 0p (no queued packets), and some dropped packets (that’s AQM doing its job).

Run the bufferbloat test again — you should get an A rating with latency staying consistent under load.

Why a systemd Service?

PPPoE interfaces come and go. The service depends on pppd-wan.service and waits for ppp0 to exist before configuring. On reconnect, the qdiscs get reapplied.

You could also use a networkd .link file or dispatcher script, but a dedicated service makes the dependency chain explicit and logs cleanly to journald.

Caveats

  • If your line rate varies (cable, LTE), you’ll need SQM with autorate-ingress or similar adaptive solution
  • The wash keyword might strip legitimate DSCP markings if your internal network uses them
  • Test your actual line speed and adjust the bandwidth values accordingly

Comments