Cgroup SKB

源代码

本章示例的完整代码可在此处找到。

什么是Cgroup SKB？

Cgroup SKB程序附加到v2 cgroup，并通过与给定cgroup内的进程相关的网络流量（出站或入站）触发。它们允许拦截和过滤与特定cgroup（因此也包括容器）相关的流量。

Cgroup SKB和分类器有什么区别？

Cgroup SKB和分类器都接收相同类型的上下文——SkBuffContext。

区别在于分类器附加到网络接口。

示例项目

本示例将类似于分类器示例——一个允许丢弃特定cgroup出站流量的程序。

设计

我们将：

创建一个HashMap，用作阻止列表。
从数据包中检查目的IP地址，并根据HashMap做出策略决策（通过或丢弃）。
从用户空间向阻止列表中添加条目。

生成vmlinux.h的绑定

在本例中，我们将使用一个名为iphdr的内核结构，它代表IP协议头。我们需要生成它的Rust绑定。

首先，我们必须确保bindgen已安装。

cargo install bindgen-cli

我们使用xtask来自动化绑定生成过程，以便将来可以通过添加以下代码轻松重现：

xtask/src/codegen.rsxtask/Cargo.tomlxtask/src/main.rs

use aya_tool::generate::InputFile;
use std::{fs::File, io::Write, path::PathBuf};

pub fn generate() -> Result<(), anyhow::Error> {
    let dir = PathBuf::from("cgroup-skb-egress-ebpf/src");
    let names: Vec<&str> = vec!["iphdr"];
    let bindings = aya_tool::generate(
        InputFile::Btf(PathBuf::from("/sys/kernel/btf/vmlinux")),
        &names,
        &[],
    )?;
    // Write the bindings to the $OUT_DIR/bindings.rs file.
    let mut out = File::create(dir.join("bindings.rs"))?;
    write!(out, "{bindings}")?;
    Ok(())
}

[package]
name = "xtask"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1"
clap = { version = "4.1", features = ["derive"] }
aya-tool = { git = "https://github.com/aya-rs/aya" }

mod build_ebpf;
mod codegen;
mod run;

use std::process::exit;

use clap::Parser;

#[derive(Debug, Parser)]
pub struct Options {
    #[clap(subcommand)]
    command: Command,
}

#[derive(Debug, Parser)]
enum Command {
    BuildEbpf(build_ebpf::Options),
    Run(run::Options),
    Codegen,
}

fn main() {
    let opts = Options::parse();

    use Command::*;
    let ret = match opts.command {
        BuildEbpf(opts) => build_ebpf::build_ebpf(opts),
        Run(opts) => run::run(opts),
        Codegen => codegen::generate(),
    };

    if let Err(e) = ret {
        eprintln!("{e:#}");
        exit(1);
    }
}

一旦我们从项目根目录使用cargo xtask codegen生成了文件，我们可以通过在eBPF代码中包含mod bindings来访问它。

eBPF代码

程序将从定义BLOCKLIST映射开始。为了强制执行策略，程序将在该映射中查找目的IP地址。如果该地址的映射条目存在，我们将通过返回0来丢弃数据包。否则，我们将通过返回1来接受它。

以下是eBPF代码的样子：

cgroup-skb-egress-ebpf/src/main.rs
#![no_std]
#![no_main]

use aya_ebpf::{
    macros::{cgroup_skb, map},
    maps::{HashMap, PerfEventArray},
    programs::SkBuffContext,
};
use memoffset::offset_of;

use cgroup_skb_egress_common::PacketLog;

#[allow(non_upper_case_globals)]
#[allow(non_snake_case)]
#[allow(non_camel_case_types)]
#[allow(dead_code)]
mod bindings;
use bindings::iphdr;

#[map]
static EVENTS: PerfEventArray<PacketLog> = PerfEventArray::new(0);

#[map] // (1)
static BLOCKLIST: HashMap<u32, u32> = HashMap::with_max_entries(1024, 0);

#[cgroup_skb]
pub fn cgroup_skb_egress(ctx: SkBuffContext) -> i32 {
    match { try_cgroup_skb_egress(ctx) } {
        Ok(ret) => ret,
        Err(_) => 0,
    }
}

// (2)
fn block_ip(address: u32) -> bool {
    unsafe { BLOCKLIST.get(&address).is_some() }
}

fn try_cgroup_skb_egress(ctx: SkBuffContext) -> Result<i32, i64> {
    let protocol = unsafe { (*ctx.skb.skb).protocol };
    if protocol != ETH_P_IP {
        return Ok(1);
    }

    let destination = u32::from_be(ctx.load(offset_of!(iphdr, daddr))?);

    // (3)
    let action = if block_ip(destination) { 0 } else { 1 };

    let log_entry = PacketLog {
        ipv4_address: destination,
        action: action,
    };
    EVENTS.output(&ctx, &log_entry, 0);
    Ok(action)
}

const ETH_P_IP: u32 = 8;

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { core::hint::unreachable_unchecked() }
}

创建我们的映射。
检查是否应该允许或拒绝数据包。
返回正确的操作。

用户空间代码

用户空间代码的目的是加载eBPF程序，将其附加到cgroup，然后用要阻止的地址填充映射。

在此示例中，我们将阻止所有出站到1.1.1.1的流量。

以下是代码的样子：

cgroup-skb-egress/src/main.rs
use std::net::Ipv4Addr;

use aya::{
    include_bytes_aligned,
    maps::{perf::AsyncPerfEventArray, HashMap},
    programs::{CgroupAttachMode, CgroupSkb, CgroupSkbAttachType},
    util::online_cpus,
    Ebpf,
};
use bytes::BytesMut;
use clap::Parser;
use log::info;
use tokio::{signal, task};

use cgroup_skb_egress_common::PacketLog;

#[derive(Debug, Parser)]
struct Opt {
    #[clap(short, long, default_value = "/sys/fs/cgroup/unified")]
    cgroup_path: String,
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let opt = Opt::parse();

    env_logger::init();

    // This will include your eBPF object file as raw bytes at compile-time and load it at
    // runtime. This approach is recommended for most real-world use cases. If you would
    // like to specify the eBPF program at runtime rather than at compile-time, you can
    // reach for `Ebpf::load_file` instead.
    #[cfg(debug_assertions)]
    let mut bpf = Ebpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/debug/cgroup-skb-egress"
    ))?;
    #[cfg(not(debug_assertions))]
    let mut bpf = Ebpf::load(include_bytes_aligned!(
        "../../target/bpfel-unknown-none/release/cgroup-skb-egress"
    ))?;
    let program: &mut CgroupSkb =
        bpf.program_mut("cgroup_skb_egress").unwrap().try_into()?;
    let cgroup = std::fs::File::open(opt.cgroup_path)?;
    // (1)
    program.load()?;
    // (2)
    program.attach(
        cgroup,
        CgroupSkbAttachType::Egress,
        CgroupAttachMode::Single,
    )?;

    let mut blocklist: HashMap<_, u32, u32> =
        HashMap::try_from(bpf.map_mut("BLOCKLIST").unwrap())?;

    let block_addr: u32 = Ipv4Addr::new(1, 1, 1, 1).try_into()?;

    // (3)
    blocklist.insert(block_addr, 0, 0)?;

    let mut perf_array =
        AsyncPerfEventArray::try_from(bpf.take_map("EVENTS").unwrap())?;

    for cpu_id in online_cpus()? {
        let mut buf = perf_array.open(cpu_id, None)?;

        task::spawn(async move {
            let mut buffers = (0..10)
                .map(|_| BytesMut::with_capacity(1024))
                .collect::<Vec<_>>();

            loop {
                let events = buf.read_events(&mut buffers).await.unwrap();
                for buf in buffers.iter_mut().take(events.read) {
                    let ptr = buf.as_ptr() as *const PacketLog;
                    let data = unsafe { ptr.read_unaligned() };
                    let src_addr = Ipv4Addr::from(data.ipv4_address);
                    info!("LOG: DST {}, ACTION {}", src_addr, data.action);
                }
            }
        });
    }

    info!("Waiting for Ctrl-C...");
    signal::ctrl_c().await?;
    info!("Exiting...");

    Ok(())
}

加载eBPF程序。
将其附加到给定的cgroup。
用我们希望阻止出站流量的远程IP地址填充映射。

第三步是通过获取BLOCKLIST映射的引用并调用blocklist.insert完成的。在Rust中使用IPv4Addr类型将允许我们读取IP地址的易读表示并将其转换为u32，这是在eBPF映射中使用的适当类型。

测试程序

首先，检查cgroup v2的挂载位置：

$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

最常见的位置是/sys/fs/cgroup或/sys/fs/cgroup/unified。

在该位置内，我们需要创建一个新的cgroup（以root身份）：

# mkdir /sys/fs/cgroup/foo

然后运行程序：

RUST_LOG=info cargo xtask run

然后，在一个单独的终端中，以root身份，尝试访问1.1.1.1：

# bash -c "echo \$ >> /sys/fs/cgroup/foo/cgroup.procs && curl 1.1.1.1"

该命令应挂起，我们程序的日志应如下所示：

LOG: DST 1.1.1.1, ACTION 0
LOG: DST 1.1.1.1, ACTION 0

另一方面，访问任何其他地址应成功，例如：

# bash -c "echo \$ >> /sys/fs/cgroup/foo/cgroup.procs && curl google.com"
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

并应产生以下日志：

LOG: DST 192.168.88.10, ACTION 1
LOG: DST 192.168.88.10, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1
LOG: DST 172.217.19.78, ACTION 1