美文网首页
ebpf学习(3)

ebpf学习(3)

作者: android小奉先 | 来源:发表于2024-09-16 22:26 被阅读0次

    本篇介绍

    本篇介绍ebpf tracing 部分, 通过bcc(BPF Compiler Collection)获取多种渠道的系统信息.

    probe 介绍

    probe在很多书上叫做探针,是一种获取执行环境动态信息的方法,通过probe 就可以分析应用和系统的运行情况,对于分析应用和系统行为有很大帮助.
    传统的probe是编写代码成内核模块ko,然后加载到内核中执行,这样如果probe代码本身有问题就会导致系统也出现问题,比如crash等.
    而bpf由于有bpf verifier把关,就可以保证probe代码的可靠.
    我们接下来讨论下probe的类型以及各种类型下bcc的使用.

    kernel probes

    通过在内核指令上设置一些动态标记来获取运行信息,对系统的负荷很小.当内核遇到这些标记后,会先执行对应的probe代码,执行完后再继续进行内核指令.这就如同是对某个函数进行了inline hook,进入函数后,先执行probe代码,再跳转回函数继续执行.
    由于这种操作依赖内核的指令接口,而这些接口并没有标准的ABI 定义,因此容易遇到不同内核版本的兼容问题.
    kernel probes 有两种,分别是kprobes和kretprobes,从名字上可以看出来,前者是用来监控指令入口,后者是指令出口.
    接下来我们用c++ 演示下用BCC实现kprobes, demo 代码如下:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    
    #include <bcc/BPF.h>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    
    int do_sys_execve(struct pt_regs *ctx, char *filename, char *argv, char * envp) {
      char comm[16] = {0};
      bpf_get_current_comm(comm, sizeof(comm));
      bpf_trace_printk("execve new process %s\n", comm);
      return 0;
    }
    
    int ret_sys_execve(struct pt_regs *ctx) {
      int return_value = PT_REGS_RC(ctx);
      bpf_trace_printk("program return value is %d\n", return_value);
      return 0;
    }
    
    )";
    
    int main(){
      ebpf::BPF bpf;
      auto init_res = bpf.init(BPF_PROGRAM);
      if (!init_res.ok()) {
        std::cout<<init_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
    
      std::string execv_fnname = bpf.get_syscall_fnname("execve");
    
      auto attach_res = bpf.attach_kprobe(execv_fnname, "do_sys_execve", 0, BPF_PROBE_ENTRY);
      if (!attach_res.ok()) {
        std::cout<<attach_res.msg()<<std::endl;
        return 1;
      }
    
      attach_res = bpf.attach_kprobe(execv_fnname, "ret_sys_execve", 0, BPF_PROBE_RETURN);
      if (!attach_res.ok()) {
        std::cout<<attach_res.msg()<<std::endl;
        return 1;
      }
      std::cout<<"bpf attach ok"<<std::endl;
      
      std::ifstream pipe("/sys/kernel/debug/tracing/trace_pipe");
    
      int count = 0;
     
      while(true) {
        std::string line;
        if(std::getline(pipe,line)) {
          std::cout<<line<<std::endl;
          count++;
          if (count > 50) {
            bpf.detach_kprobe(execv_fnname, BPF_PROBE_ENTRY);
            bpf.detach_kprobe(execv_fnname, BPF_PROBE_RETURN);
        break;
          }
        } else {
          sleep(1);
        }
      }
      return 0;
    }
    
    

    编译如上代码需要拥有bcc环境, 一般是通过源码配置,具体可以参考bcc install md, 可能在配置时候也会出现一些问题, 具体看配置环境,我目前的机器是Ubuntu 24.04.1 LTS, 内核是Linux version 6.8.0-44-generic(我刚开始工作时, 使用的内核版本还是3.9, 现在都6.8了, 一丝唏嘘流过) ,配置好后, 编译即可运行:

    bpf init ok
    bpf attach ok
               <...>-518742  [001] ...21 89067.095248: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.095268: bpf_trace_printk: program return value is -2
    
               <...>-518742  [001] ...21 89067.095286: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.095292: bpf_trace_printk: program return value is -2
    
               <...>-518742  [001] ...21 89067.095298: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.095305: bpf_trace_printk: program return value is -2
    
               <...>-518742  [001] ...21 89067.095311: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.095317: bpf_trace_printk: program return value is -2
    
               <...>-518742  [001] ...21 89067.095323: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.095329: bpf_trace_printk: program return value is -2
    
               <...>-518742  [001] ...21 89067.095335: bpf_trace_printk: execve new process code
    
               <...>-518742  [001] ...21 89067.100773: bpf_trace_printk: program return value is 0
    
               <...>-518749  [000] ...21 89067.625411: bpf_trace_printk: execve new process code
    
               <...>-518749  [000] ...21 89067.625429: bpf_trace_printk: program return value is -2
    
               <...>-518749  [000] ...21 89067.625438: bpf_trace_printk: execve new process code
    
               <...>-518749  [000] ...21 89067.625442: bpf_trace_printk: program return value is -2
    
               <...>-518749  [000] ...21 89067.625445: bpf_trace_printk: execve new process code
    
               <...>-518749  [000] ...21 89067.625448: bpf_trace_printk: program return value is -2
    
               <...>-518749  [000] ...21 89067.625451: bpf_trace_printk: execve new process code
    
               <...>-518749  [000] ...21 89067.625454: bpf_trace_printk: program return value is -2
    
               <...>-518749  [000] ...21 89067.625456: bpf_trace_printk: execve new process code
    
               <...>-518749  [000] ...21 89067.629358: bpf_trace_printk: program return value is 0
    
               <...>-518756  [003] ...21 89069.133218: bpf_trace_printk: execve new process code
    
               <...>-518756  [003] ...21 89069.133239: bpf_trace_printk: program return value is -2
    
               <...>-518756  [003] ...21 89069.133257: bpf_trace_printk: execve new process code
    
               <...>-518756  [003] ...21 89069.133263: bpf_trace_printk: program return value is -2
    
    

    上述方法涉及到3个关键函数:

      StatusTuple init(const std::string& bpf_program,
                       const std::vector<std::string>& cflags = {},
                       const std::vector<USDT>& usdt = {});
      StatusTuple attach_kprobe(const std::string& kernel_func,
                                const std::string& probe_func,
                                uint64_t kernel_func_offset = 0,
                                bpf_probe_attach_type = BPF_PROBE_ENTRY,
                                int maxactive = 0);
      StatusTuple detach_kprobe(
          const std::string& kernel_func,
          bpf_probe_attach_type attach_type = BPF_PROBE_ENTRY);
    

    差不多从函数参数上可以看出来对应的用法.
    这儿需要补充一个信息,probe函数中第一个参数总是struct pt_regs*, 这个是什么呢? 这个存储了cpu运行时对应进程的上下文信息,具体内容和处理器有关,比如arm,x86等. 比如我个人接触较多的arm64结构如下:

    /*
     * This struct defines the way the registers are stored on the stack during an
     * exception. Note that sizeof(struct pt_regs) has to be a multiple of 16 (for
     * stack alignment). struct user_pt_regs must form a prefix of struct pt_regs.
     */
    struct pt_regs {
        union {
            struct user_pt_regs user_regs;
            struct {
                u64 regs[31];
                u64 sp;
                u64 pc;
                u64 pstate;
            };
        };
        u64 orig_x0;
    #ifdef __AARCH64EB__
        u32 unused2;
        s32 syscallno;
    #else
        s32 syscallno;
        u32 unused2;
    #endif
        u64 sdei_ttbr1;
        /* Only valid when ARM64_HAS_GIC_PRIO_MASKING is enabled. */
        u64 pmr_save;
        u64 stackframe[2];
    
        /* Only valid for some EL1 exceptions. */
        u64 lockdep_hardirqs;
        u64 exit_rcu;
    };
    
    

    从上面的demo也可以看出来, kprobe以来于内核的实现,存在一定的兼容性风险,接下来我们看一种稳定的方法.

    Tracepoints

    相比与kprobe,tracepoints位于内核中的固定位置,这就是为什么tracepoint叫做静态标记,这些标记在内核中会保证兼容,也就是老版本有的tracepoints 在新版本中也一定有, 不过也正因为这样,所以系统中tracepoints比较有限,不如kprobe那么灵活.
    可以利用如下命令查看系统中支持的tracepoints:

    root@shanks-ThinkPad-T460s:/sys/kernel/debug/tracing/events# ls 
    alarmtimer        devlink         hda_intel     irq            mdio       page_isolation  rv                       thp
    amd_cpu           dma_fence       header_event  irq_matrix     mei        pagemap         sched                    timer
    asoc              drm             header_page   irq_vectors    migrate    page_pool       scsi                     tlb
    avc               e1000e_trace    huge_memory   iwlwifi        mmap       percpu          sd                       tls
    block             enable          hwmon         iwlwifi_data   mmap_lock  power           signal                   udp
    bpf_test_run      error_report    hyperv        iwlwifi_io     mmc        printk          skb                      v4l2
    bpf_trace         exceptions      i2c           iwlwifi_msg    module     pwm             smbus                    vb2
    bridge            ext4            i915          iwlwifi_ucode  mptcp      qdisc           sock                     vmalloc
    cfg80211          fib             initcall      jbd2           msr        qrtr            sof                      vmscan
    cgroup            fib6            intel_avs     kmem           napi       ras             sof_intel                vsyscall
    clk               filelock        intel_iommu   ksm            neigh      raw_syscalls    spi                      watchdog
    compaction        filemap         intel-sst     libata         net        rcu             swiotlb                  wbt
    context_tracking  fs_dax          interconnect  lock           netlink    regmap          sync_trace               workqueue
    cpuhp             ftrace          iocost        mac80211       nmi        regulator       syscalls                 writeback
    cros_ec           gpio            iomap         mac80211_msg   notifier   resctrl         task                     x86_fpu
    csd               handshake       iommu         maple_tree     nvme       rpm             tcp                      xdp
    dev               hda             io_uring      mce            oom        rseq            thermal                  xen
    devfreq           hda_controller  ipi           mctp           osnoise    rtc             thermal_power_allocator  xhci-hcd
    
    

    每个目录下分别存在enable和filter两个文件,前者用来使能当前tracepoints,后者用来事件过滤. 接下来我们看一个例子:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    
    #include <bcc/BPF.h>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    #include <linux/sched.h>
    
    int on_sched_switch(struct tracepoint__sched__sched_switch *args) {
      bpf_trace_printk("sched switch %s\n",args);
      return 0;
    }
    
    )";
    
    int main(){
      ebpf::BPF bpf;
      auto init_res = bpf.init(BPF_PROGRAM);
      if (!init_res.ok()) {
        std::cout<<init_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
    
      auto attach_res = bpf.attach_tracepoint("sched:sched_switch", "on_sched_switch");
      if (!attach_res.ok()) {
        std::cout<<attach_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"attach tracepoint ok"<<std::endl;
      std::ifstream pipe("/sys/kernel/debug/tracing/trace_pipe");
    
      int count = 0;
     
      while(true) {
        std::string line;
        if(std::getline(pipe,line)) {
          std::cout<<line<<std::endl;
          count++;
          if (count > 10) {
            bpf.detach_tracepoint("sched:sched_switch");
        break;
          }
        } else {
          sleep(1);
        }
      }
      return 0;
    }
    
    

    编译后运行,即可得到如下结果:

    bpf init ok
    attach tracepoint ok
              <idle>-0       [003] d..2. 108816.680677: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
              docker-652525  [003] d..2. 108816.680683: sched_switch: prev_comm=docker prev_pid=652525 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
              <idle>-0       [003] d..2. 108816.681607: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
              docker-652525  [003] d..2. 108816.681618: sched_switch: prev_comm=docker prev_pid=652525 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
              <idle>-0       [003] d..2. 108816.681623: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
              docker-652525  [003] d..2. 108816.681649: sched_switch: prev_comm=docker prev_pid=652525 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
              <idle>-0       [003] d..2. 108816.681710: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
              docker-652525  [003] d..2. 108816.681724: sched_switch: prev_comm=docker prev_pid=652525 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
              <idle>-0       [003] d..2. 108816.681739: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
              docker-652525  [003] d..2. 108816.681750: sched_switch: prev_comm=docker prev_pid=652525 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
              <idle>-0       [003] d..2. 108816.681783: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=docker next_pid=652525 next_prio=120
    
    

    uprobes

    看了内核态的操作后,很容易问用户态是否支持类似操作? 其实用户态也一样,也可以注册probes,也分为不稳定的动态probe和稳定的静态probe.

    现在先介绍下不稳定的uprobes,这个操作需要依赖函数的签名,如果签名变了,那么probe就失效了. 接下来我们看一个例子, 目的是测量某个用户态函数的执行耗时,用户态对应的代码如下:

    #include <iostream>
    #include <unistd.h>
    
    int foo1(int v) {
      return v + 1;
    }
    
    int main() {
      int v = 1;
      v = foo1(v);
      std::cout<<v<<std::endl;
      return 0;
    }
    
    

    这儿测试的目标是foo1. 接下来我们看下如何不修改代码实现probe该函数, 对应的bcc代码如下:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    
    #include <bcc/BPF.h>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    #include <linux/sched.h>
    
    BPF_HASH(cache, u64,u64);
    
    int trace_start_time(struct pt_regs *ctx) {
      u64 pid = bpf_get_current_pid_tgid();
      u64 start_time_ns = bpf_ktime_get_ns();
      cache.update(&pid, &start_time_ns);
      return 0;
    }
    
    int print_duration(struct pt_regs *ctx) {
      u64 pid = bpf_get_current_pid_tgid();
      u64 *start_time_ns = cache.lookup(&pid);
      if (start_time_ns == NULL) {
         return 0;
      }
      u64 duration_ns = bpf_ktime_get_ns() - *start_time_ns;
      bpf_trace_printk("call duration %d\n", duration_ns);
      return 0;
    }
    
    )";
    
    int main(){
      ebpf::BPF bpf;
      auto init_res = bpf.init(BPF_PROGRAM);
      if (!init_res.ok()) {
        std::cout<<init_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
    
      auto attach_res = bpf.attach_uprobe("./user.bin", "_Z4foo1i", "trace_start_time", 0, BPF_PROBE_ENTRY);
      if (!attach_res.ok()) {
        std::cout<<attach_res.msg() << std::endl;
        return 1;
      }
      attach_res = bpf.attach_uprobe("./user.bin", "_Z4foo1i", "print_duration", 0, BPF_PROBE_RETURN);
      if (!attach_res.ok()) {
        std::cout<<attach_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"attach uprobe ok"<<std::endl;
      std::ifstream pipe("/sys/kernel/debug/tracing/trace_pipe");
    
      int count = 0;
     
      while(true) {
        std::string line;
        if(std::getline(pipe,line)) {
          std::cout<<line<<std::endl;
          if (!line.empty()) {
            count++;    
          }
          if (count > 10) {
            bpf.detach_uprobe("./user.bin", "_Z4foo1i", 0, BPF_PROBE_ENTRY);
            bpf.detach_uprobe("./user.bin", "_Z4foo1i", 0, BPF_PROBE_RETURN);
        break;
          }
        } else {
          sleep(1);
        }
      }
      return 0;
    }
    
    

    编译并运行:

    bpf init ok
    attach uprobe ok
               <...>-696974  [003] ...11 115306.935932: bpf_trace_printk: call duration 16884
    
               <...>-696982  [001] ...11 115307.566014: bpf_trace_printk: call duration 18168
    
               <...>-696989  [001] ...11 115308.106875: bpf_trace_printk: call duration 47780
    
               <...>-696990  [001] ...11 115308.660956: bpf_trace_printk: call duration 18897
    
               <...>-696998  [001] ...11 115309.261987: bpf_trace_printk: call duration 55242
    
               <...>-696999  [001] ...11 115309.876613: bpf_trace_printk: call duration 17145
    
               <...>-697007  [003] ...11 115310.477063: bpf_trace_printk: call duration 49513
    
               <...>-697015  [002] ...11 115311.181669: bpf_trace_printk: call duration 47936
    
               <...>-697016  [003] ...11 115311.841587: bpf_trace_printk: call duration 17201
    
               <...>-697024  [000] ...11 115312.381908: bpf_trace_printk: call duration 51744
    
               <...>-697097  [000] ...11 115324.187386: bpf_trace_printk: call duration 48442
    

    可以看到foo的执行耗时已经成功获取到了.实现uprobe的关键API如下:

      StatusTuple attach_uprobe(const std::string& binary_path,
                                const std::string& symbol,
                                const std::string& probe_func,
                                uint64_t symbol_addr = 0,
                                bpf_probe_attach_type attach_type = BPF_PROBE_ENTRY,
                                pid_t pid = -1,
                                uint64_t symbol_offset = 0,
                                uint32_t ref_ctr_offset = 0);
      StatusTuple detach_uprobe(const std::string& binary_path,
                                const std::string& symbol, uint64_t symbol_addr = 0,
                                bpf_probe_attach_type attach_type = BPF_PROBE_ENTRY,
                                pid_t pid = -1,
                                uint64_t symbol_offset = 0);
    

    差不多也是从API 参数就可以看到作用.

    USDT

    User Statically Defined Tracepoints 是用户态的"tracepoints", 也可以稳定获取用户态代码的信息. usdt 很大程度上是参考的DTrace,一会儿可以从使用上看出来. 由于USDT是稳定的probe方法,因此就要求用户态代码显式定义USDT. 接下来我们看一个例子,用户态如何定义usdt.

    #include <sys/sdt.h>
    #include <unistd.h>
    
    int main() {
      DTRACE_PROBE(hello-usdt, probe-main);
      return 0;
    }
    
    

    就这么简单, 定义后, 直接编译成elf,利用readelf 命令就可以看到对应的usdt信息.

    shanks@shanks-ThinkPad-T460s:~/Documents/01code/ebpf/chapter03$ readelf -n ./usdt.bin 
    
    Displaying notes found in: .note.gnu.property
      Owner                Data size    Description
      GNU                  0x00000020   NT_GNU_PROPERTY_TYPE_0
          Properties: x86 feature: IBT, SHSTK
        x86 ISA needed: x86-64-baseline
    
    Displaying notes found in: .note.gnu.build-id
      Owner                Data size    Description
      GNU                  0x00000014   NT_GNU_BUILD_ID (unique build ID bitstring)
        Build ID: 7054072f29c24f0617dae26545233b7107d84668
    
    Displaying notes found in: .note.ABI-tag
      Owner                Data size    Description
      GNU                  0x00000010   NT_GNU_ABI_TAG (ABI version tag)
        OS: Linux, ABI: 3.2.0
    
    Displaying notes found in: .note.stapsdt
      Owner                Data size    Description
      stapsdt              0x0000002f   NT_STAPSDT (SystemTap probe descriptors)
        Provider: hello-usdt
        Name: probe-main
        Location: 0x0000000000001131, Base: 0x0000000000002004, Semaphore: 0x0000000000000000
        Arguments: 
    

    接下来我们看下如何利用bcc 进行probe:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    
    #include <bcc/BPF.h>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    #include <linux/sched.h>
    
    int trace_usdt_exec(struct pt_regs *ctx) {
      u64 pid = bpf_get_current_pid_tgid();
      bpf_trace_printk("usdt process running with pid %d\n", pid);
      return 0;
    }
    
    )";
    
    int main(){
      ebpf::USDT usdt("./usdt.bin", "hello-usdt", "probe-main", "trace_usdt_exec");
      ebpf::BPF bpf;
      auto res = bpf.init(BPF_PROGRAM, {}, {usdt});
      if (!res.ok()) {
        std::cout<<res.msg()<<std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
      res = bpf.attach_usdt(usdt);
      if (!res.ok()) {
        std::cout<<res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf attach usdt ok"<<std::endl;
    
      std::ifstream pipe("/sys/kernel/debug/tracing/trace_pipe");
      int count = 0;
      while(true) {
        std::string line;
        if(std::getline(pipe,line)) {
          std::cout<<line<<std::endl;
          if (!line.empty()) {
            count++;    
          }
          if (count > 10) {
            bpf.detach_usdt(usdt);
        break;
          }
        } else {
          sleep(1);
        }
      }
      return 0;
    }
    
    

    同样编译后,运行即可看到结果如下:

    bpf init ok
    bpf attach usdt ok
               <...>-732507  [003] ...11 120226.668339: bpf_trace_printk: usdt process running with pid 732507
    
               <...>-732510  [003] ...11 120227.210096: bpf_trace_printk: usdt process running with pid 732510
    
               <...>-732519  [002] ...11 120227.753714: bpf_trace_printk: usdt process running with pid 732519
    
               <...>-732529  [001] ...11 120228.306039: bpf_trace_printk: usdt process running with pid 732529
    
               <...>-732532  [003] ...11 120228.834264: bpf_trace_printk: usdt process running with pid 732532
    
               <...>-732535  [002] ...11 120229.384237: bpf_trace_printk: usdt process running with pid 732535
    
               <...>-732545  [002] ...11 120229.972352: bpf_trace_printk: usdt process running with pid 732545
    
               <...>-732555  [003] ...11 120230.823713: bpf_trace_printk: usdt process running with pid 732555
    
               <...>-732558  [003] ...11 120231.482447: bpf_trace_printk: usdt process running with pid 732558
    
               <...>-732567  [000] ...11 120232.159065: bpf_trace_printk: usdt process running with pid 732567
    
               <...>-732577  [000] ...11 120232.824309: bpf_trace_printk: usdt process running with pid 732577
    

    这儿的关键函数是:

      StatusTuple attach_usdt(const USDT& usdt, pid_t pid = -1);
      StatusTuple detach_usdt(const USDT& usdt, pid_t pid = -1);
    

    性能统计

    看了上面内容可能会觉得ebpf很强大,实际上更强大. ebpf还可以进行性能统计! 这儿简单介绍下on cpu 和off cpu这两个在统计性能时非常重要的概念.
    on-cpu: 就是cpu花到当前目标应用上的时间
    off-cput: 就是cpu没花到目标应用上的时间
    可能这样理解有点直白,实际上很有学问,比如这儿就可以提出问题,如何增加目标应用在on-cpu上的时间?
    本节的目的主要是讲ebpf在性能统计上的用法,所以就简单提一下,后续有机会再专门讨论。先看一个统计内存引用命中率的case:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    
    #include <bcc/BPF.h>
    #include <linux/perf_event.h>
    #include <iomanip>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    #include <uapi/linux/bpf_perf_event.h>
    
    struct event_t {
      int cpu;
      int pid;
      char name[16];
    };
    
    BPF_HASH(ref_count, struct event_t);
    BPF_HASH(miss_count, struct event_t);
    
    static inline __attribute__((always_inline)) void get_key(struct event_t *key) {
     key->cpu = bpf_get_smp_processor_id();
     key->pid = bpf_get_current_pid_tgid();
     bpf_get_current_comm(key->name, sizeof(key->name));
    }
    
    int on_cache_miss(struct bpf_perf_event_data *ctx) {
      struct event_t key = {};
      get_key(&key);
      u64 zero = 0, *val;
      val = miss_count.lookup_or_try_init(&key, &zero);
      if (val) {
        *val += ctx->sample_period;
      }
      return 0;info_t
    }
    
    int on_cache_ref(struct bpf_perf_event_data *ctx) {
      struct event_t key = {};
      get_key(&key); 
    
      u64 zero = 0, *val;
      val = ref_count.lookup_or_try_init(&key, &zero);
      if (val) {
        *val += ctx->sample_period;
      }
      return 0; 
    }
    
    )";
    
    struct event_t {
      int cpu;
      int pid;
      char name[16];
    };
    
    int main(){
      ebpf::BPF bpf;
      auto init_res = bpf.init(BPF_PROGRAM);
      if (!init_res.ok()) {
        std::cout<<init_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
    
      auto res = bpf.attach_perf_event(PERF_TYPE_HARDWARE,PERF_COUNT_HW_CACHE_REFERENCES, "on_cache_ref", 100, 0);
      if (!res.ok()) {
        std::cout<<res.msg()<<std::endl;
        return 1;
      }
    
      res = bpf.attach_perf_event(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES, "on_cache_miss", 100, 0);
      if (!res.ok()) {
        std::cout<<res.msg()<<std::endl;
        return 1;
      }
      std::cout<<"bpf attach ok"<<std::endl;
      
      std::cout<<"probing ..."<<std::endl;
      sleep(5);
    
      bpf.detach_perf_event(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES);
      bpf.detach_perf_event(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES);
    
      auto refs = bpf.get_hash_table<event_t, uint64_t>("ref_count");
      auto misses = bpf.get_hash_table<event_t, uint64_t>("miss_count");
      for (auto it : refs.get_table_offline()) {
        uint64_t hit;
        auto miss = misses[it.first];
        hit = miss <= it.second? it.second -miss:0;
        double ratio = double(hit) / double(it.second) * 100.0;
        std::cout<<"pid " << std::setw(8) << std::setfill(' ') << it.first.pid;
        std::cout<<std::setw(20) << std::setfill(' ')<< std::left << " (" +
              std::string(it.first.name) + ")" <<std::right;
        std::cout<<"on cpu "<<std::setw(2) << std::setfill(' ') << it.first.cpu;
        std::cout<<" hit rate "<< std::setprecision(4)<<ratio<<"% ";
        std::cout<<"(" << hit << "/" << it.second<<")" << std::endl;
      }
      return 0;
    }
    
    

    运行结果如下:

    bpf init ok
    bpf attach ok
    probing ...
    pid    22363 (code)             on cpu  1 hit rate 0% (0/7100)
    pid     3912 (gmain)            on cpu  3 hit rate 73.91% (1700/2300)
    pid     7843 (code)             on cpu  0 hit rate 4.231% (1100/26000)
    pid     7887 (code)             on cpu  3 hit rate 0% (0/5000)
    pid   723596 (kworker/u9:1)     on cpu  3 hit rate 42.42% (22100/52100)
    pid   723596 (kworker/u9:1)     on cpu  1 hit rate 74.21% (70200/94600)
    pid   782016 (docker)           on cpu  3 hit rate 26.04% (4400/16900)
    pid     1590 (containerd)       on cpu  1 hit rate 77.78% (7700/9900)
    pid   782020 (docker)           on cpu  1 hit rate 0% (0/91800)
    pid   782013 (docker)           on cpu  0 hit rate 0% (0/30700)
    pid     4356 (HangWatcher)      on cpu  0 hit rate 10% (300/3000)
    pid   782040 (code)             on cpu  1 hit rate 0% (0/57500)
    pid     1610 (containerd)       on cpu  2 hit rate 39.19% (5800/14800)
    pid   782039 (docker)           on cpu  3 hit rate 0% (0/5000)
    pid    97688 (code)             on cpu  0 hit rate 33.33% (1100/3300)
    pid    97663 (code)             on cpu  1 hit rate 1.918% (4100/213800)
    pid   529775 (cpptools-srv)     on cpu  0 hit rate 71.84% (7400/10300)
    pid    97845 (code)             on cpu  0 hit rate 34.51% (3900/11300)
    pid     7947 (code)             on cpu  0 hit rate 0% (0/12500)
    pid   205757 (HangWatcher)      on cpu  0 hit rate 52.38% (3300/6300)
    pid   782033 (code)             on cpu  0 hit rate 2.41% (800/33200)
    pid   780635 (kworker/0:0)      on cpu  0 hit rate 64.47% (24500/38000)
    pid       18 (migration/0)      on cpu  0 hit rate 50% (500/1000)
    pid     3236 (JS Helper)        on cpu  0 hit rate 63.33% (1900/3000)
    pid     2326 (dockerd)          on cpu  1 hit rate 88.89% (2400/2700)
    pid   782046 (docker)           on cpu  3 hit rate 76.19% (1600/2100)
    pid   782017 (docker)           on cpu  1 hit rate 14.88% (1800/12100)
    pid   782030 (docker)           on cpu  1 hit rate 47.06% (1600/3400)
    pid     3726 (gvfs-afc-volume)  on cpu  1 hit rate 35.71% (1500/4200)
    pid     3226 (gnome-shell)      on cpu  2 hit rate 18.91% (201300/1064400)
    pid       35 (migration/3)      on cpu  3 hit rate 81.48% (2200/2700)
    pid   733067 (cpptools)         on cpu  1 hit rate 82.35% (1400/1700)
    pid     1753 (rtkit-daemon)     on cpu  0 hit rate 39.29% (1100/2800)
    pid   782037 (docker)           on cpu  1 hit rate 80.49% (3300/4100)
    pid       64 (kworker/3:1H)     on cpu  3 hit rate 7.143% (800/11200)
    pid   756219 (kworker/u8:2)     on cpu  0 hit rate 28.1% (3400/12100)
    pid   782035 (docker)           on cpu  2 hit rate 50.63% (4000/7900)
    pid   782028 (docker)           on cpu  2 hit rate 77.57% (8300/10700)
    pid   529963 (cpptools-srv)     on cpu  1 hit rate 64.71% (6600/10200)
    pid     1610 (containerd)       on cpu  3 hit rate 1.347% (400/29700)
    pid    97456 (Compositor)       on cpu  2 hit rate 43.33% (2600/6000)
    pid   782034 (docker)           on cpu  3 hit rate 57.36% (7400/12900)
    pid     4068 (sogoupinyinServ)  on cpu  0 hit rate 19.98% (35800/179200)
    pid     8508 (code)             on cpu  1 hit rate 50% (1900/3800)
    pid     4251 (chrome)           on cpu  2 hit rate 0% (0/31500)
    pid   782014 (docker)           on cpu  0 hit rate 79.31% (6900/8700)
    pid     5785 (gpg-agent)        on cpu  1 hit rate 61.11% (2200/3600)
    pid     8503 (code)             on cpu  2 hit rate 42.11% (800/1900)
    pid     1048 (NetworkManager)   on cpu  1 hit rate 0% (0/2700)
    pid        0 (swapper/1)        on cpu  1 hit rate 79.07% (2133300/2697900)
    pid     7883 (code)             on cpu  2 hit rate 53.47% (5400/10100)
    pid     7883 (code)             on cpu  3 hit rate 62.07% (1800/2900)
    pid   781990 (kworker/u9:2)     on cpu  1 hit rate 79.04% (36200/45800)
    pid     1610 (containerd)       on cpu  1 hit rate 36.23% (22100/61000)
    pid      132 (kworker/0:1H)     on cpu  0 hit rate 0% (0/1500)
    pid     4300 (chrome)           on cpu  2 hit rate 0% (0/4000)
    pid    97663 (code)             on cpu  3 hit rate 0.1463% (200/136700)
    pid   782026 (docker)           on cpu  3 hit rate 0% (0/6400)
    pid       17 (rcu_preempt)      on cpu  0 hit rate 77.66% (22600/29100)
    pid    98376 (cpptools)         on cpu  2 hit rate 33.93% (1900/5600)
    pid   782028 (docker)           on cpu  1 hit rate 96.77% (3000/3100)
    pid      257 (jbd2/nvme0n1p2-)  on cpu  0 hit rate 0% (0/8500)
    pid   782016 (docker)           on cpu  1 hit rate 0% (0/1300)
    pid    97446 (code)             on cpu  1 hit rate 0% (0/15000)
    pid    97674 (code)             on cpu  3 hit rate 27.04% (6300/23300)
    pid     1207 (gmain)            on cpu  0 hit rate 0% (0/2500)
    pid   782030 (docker)           on cpu  2 hit rate 48.55% (6700/13800)
    pid    97674 (code)             on cpu  1 hit rate 86.49% (3200/3700)
    pid   782015 (docker)           on cpu  1 hit rate 50% (200/400)
    pid     3226 (gnome-shell)      on cpu  0 hit rate 52.27% (27600/52800)
    pid     5804 (gnome-terminal-)  on cpu  1 hit rate 4.598% (16300/354500)
    pid     3256 (gnome-shell)      on cpu  1 hit rate 0% (0/53200)
    pid    97673 (code)             on cpu  3 hit rate 0% (0/13000)
    pid     1613 (containerd)       on cpu  0 hit rate 30.89% (35400/114600)
    pid    97684 (code)             on cpu  0 hit rate 64.29% (4500/7000)
    pid   756219 (kworker/u8:2)     on cpu  1 hit rate 28.14% (6500/23100)
    pid     7911 (code)             on cpu  0 hit rate 0% (0/16300)
    pid   779807 (kworker/u9:4)     on cpu  1 hit rate 71.52% (75100/105000)
    pid   782041 (docker)           on cpu  3 hit rate 79.37% (5000/6300)
    pid   733065 (cpptools)         on cpu  0 hit rate 0% (0/100)
    pid       16 (ksoftirqd/0)      on cpu  0 hit rate 54.29% (1900/3500)
    pid   782036 (docker)           on cpu  3 hit rate 0% (0/6500)
    pid       17 (rcu_preempt)      on cpu  1 hit rate 64.39% (8500/13200)
    pid       36 (ksoftirqd/3)      on cpu  3 hit rate 60% (600/1000)
    pid     8483 (code)             on cpu  3 hit rate 0% (0/7000)
    pid       23 (migration/1)      on cpu  1 hit rate 87.5% (700/800)
    pid   189281 (HangWatcher)      on cpu  0 hit rate 8% (400/5000)
    pid   782043 (docker)           on cpu  2 hit rate 37.29% (6600/17700)
    pid   551770 (cpptools-srv)     on cpu  0 hit rate 0% (0/4300)
    pid   782017 (docker)           on cpu  3 hit rate 66.67% (1800/2700)
    pid   781984 (tracker-extract)  on cpu  1 hit rate 0% (0/2400)
    pid   782035 (docker)           on cpu  1 hit rate 31.82% (2800/8800)
    pid      626 (irq/132-iwlwifi)  on cpu  2 hit rate 0% (0/5700)
    pid     8526 (code)             on cpu  3 hit rate 61.9% (2600/4200)
    pid     3235 (JS Helper)        on cpu  1 hit rate 78.12% (2500/3200)
    pid       29 (migration/2)      on cpu  2 hit rate 66.67% (1600/2400)
    pid   760435 (bluetoothd)       on cpu  2 hit rate 7.647% (3900/51000)
    pid     1590 (containerd)       on cpu  0 hit rate 39.39% (1300/3300)
    pid     3238 (KMS thread)       on cpu  0 hit rate 57.74% (577000/999300)
    pid     5804 (gnome-terminal-)  on cpu  3 hit rate 22.27% (31200/140100)
    pid     3226 (gnome-shell)      on cpu  1 hit rate 38.27% (63100/164900)
    pid   782014 (docker)           on cpu  1 hit rate 66.96% (7700/11500)
    pid   529521 (cpptools-srv)     on cpu  0 hit rate 55.06% (4900/8900)
    pid   717380 (kworker/2:2)      on cpu  2 hit rate 56.07% (13400/23900)
    pid   782015 (docker)           on cpu  0 hit rate 1.562% (100/6400)
    pid     1584 (containerd)       on cpu  1 hit rate 53.14% (44000/82800)
    pid       24 (ksoftirqd/1)      on cpu  1 hit rate 46.63% (9700/20800)
    pid   782019 (docker)           on cpu  0 hit rate 31.25% (4500/14400)
    pid     7851 (Chrome_IOThread)  on cpu  1 hit rate 0% (0/7200)
    pid   782013 (code)             on cpu  3 hit rate 0% (0/33900)
    pid   210799 (code)             on cpu  0 hit rate 0% (0/11000)
    pid     7974 (code)             on cpu  3 hit rate 0% (0/5000)
    pid   782024 (docker)           on cpu  3 hit rate 39.13% (900/2300)
    pid     4309 (chrome)           on cpu  1 hit rate 0% (0/8400)
    pid    97450 (Chrome_ChildIOT)  on cpu  3 hit rate 0% (0/9400)
    pid     4297 (chrome)           on cpu  0 hit rate 14.29% (1200/8400)
    pid   760435 (bluetoothd)       on cpu  3 hit rate 0% (0/78800)
    pid       17 (rcu_preempt)      on cpu  3 hit rate 68.29% (14000/20500)
    pid     7910 (CacheThread_Blo)  on cpu  3 hit rate 0% (0/400)
    pid     8492 (code)             on cpu  0 hit rate 0% (0/138200)
    pid     5804 (gnome-terminal-)  on cpu  2 hit rate 27.8% (50200/180600)
    pid     8504 (code)             on cpu  0 hit rate 61.02% (3600/5900)
    pid     8503 (code)             on cpu  1 hit rate 50.43% (17600/34900)
    pid     5804 (gnome-terminal-)  on cpu  0 hit rate 22.74% (6800/29900)
    pid    98373 (cpptools)         on cpu  1 hit rate 0% (0/3000)
    pid   782044 (docker)           on cpu  0 hit rate 18.56% (3100/16700)
    pid     3256 (gnome-shell)      on cpu  0 hit rate 47.1% (12200/25900)
    pid      122 (kworker/2:1H)     on cpu  2 hit rate 51.69% (19900/38500)
    pid   781987 (pool-tracker-ex)  on cpu  0 hit rate 1.099% (100/9100)
    pid     3256 (gnome-shell)      on cpu  2 hit rate 52.11% (43300/83100)
    pid        0 (swapper/0)        on cpu  0 hit rate 75.81% (2063500/2722100)
    pid     7926 (code)             on cpu  0 hit rate 37.96% (4100/10800)
    pid   768320 (kworker/1:2)      on cpu  1 hit rate 40.46% (40300/99600)
    pid   779807 (kworker/u9:4)     on cpu  2 hit rate 76.34% (117800/154300)
    pid   781999 (perf_event.bin)   on cpu  3 hit rate 50% (7300/14600)
    pid   782021 (docker)           on cpu  0 hit rate 0% (0/1800)
    pid   782031 (docker)           on cpu  0 hit rate 32.2% (7600/23600)
    pid   157242 (cpptools)         on cpu  0 hit rate 0% (0/2100)
    pid   547196 (cpptools)         on cpu  1 hit rate 8.209% (1100/13400)
    pid     7884 (code)             on cpu  3 hit rate 0% (0/2000)
    pid      744 (systemd-oomd)     on cpu  0 hit rate 7.031% (1800/25600)
    pid     1002 (avahi-daemon)     on cpu  1 hit rate 0% (0/18400)
    pid   782034 (docker)           on cpu  1 hit rate 91.18% (3100/3400)
    pid   782022 (docker)           on cpu  0 hit rate 62.73% (6900/11000)
    pid   782021 (docker)           on cpu  2 hit rate 80.92% (12300/15200)
    pid   782042 (docker)           on cpu  2 hit rate 86.11% (6200/7200)
    pid   782028 (docker)           on cpu  3 hit rate 25.64% (1000/3900)
    pid     5335 (chrome)           on cpu  2 hit rate 12.17% (1400/11500)
    pid    98376 (cpptools)         on cpu  3 hit rate 12% (300/2500)
    pid     8492 (code)             on cpu  3 hit rate 2.302% (2700/117300)
    pid       30 (ksoftirqd/2)      on cpu  2 hit rate 57.14% (800/1400)
    pid     8505 (code)             on cpu  1 hit rate 44.74% (1700/3800)
    pid        0 (swapper/3)        on cpu  3 hit rate 79.53% (1942800/2443000)
    pid   782027 (code)             on cpu  0 hit rate 0% (0/15000)
    pid     4173 (GUsbEventThread)  on cpu  0 hit rate 33.87% (2100/6200)
    pid   760435 (bluetoothd)       on cpu  0 hit rate 11.59% (2400/20700)
    pid   529521 (cpptools-srv)     on cpu  2 hit rate 0% (0/1000)
    pid    97450 (Chrome_ChildIOT)  on cpu  1 hit rate 54% (2700/5000)
    pid   223024 (HangWatcher)      on cpu  2 hit rate 60.27% (4400/7300)
    pid     7883 (code)             on cpu  0 hit rate 0% (0/1500)
    pid     7887 (code)             on cpu  2 hit rate 8.125% (1300/16000)
    pid     7843 (code)             on cpu  2 hit rate 0% (0/31800)
    pid     7978 (Chrome_ChildIOT)  on cpu  1 hit rate 18.92% (1400/7400)
    pid   767096 (cpptools-srv)     on cpu  0 hit rate 0% (0/7100)
    pid     3237 (JS Helper)        on cpu  0 hit rate 56.14% (3200/5700)
    pid   104929 (code)             on cpu  2 hit rate 0% (0/1600)
    pid     8483 (code)             on cpu  1 hit rate 17.19% (1100/6400)
    pid   782027 (docker)           on cpu  0 hit rate 0% (0/30000)
    pid     4081 (QDBusConnection)  on cpu  3 hit rate 2.419% (300/12400)
    pid    21523 (cpptools)         on cpu  2 hit rate 44.78% (6000/13400)
    pid   782040 (docker)           on cpu  1 hit rate 0% (0/114900)
    pid   782036 (docker)           on cpu  1 hit rate 77.22% (6100/7900)
    pid   782035 (docker)           on cpu  3 hit rate 0% (0/12700)
    pid     1754 (rtkit-daemon)     on cpu  0 hit rate 0% (0/400)
    pid     4305 (Chrome_ChildIOT)  on cpu  0 hit rate 0% (0/3700)
    pid   756219 (kworker/u8:2)     on cpu  3 hit rate 0% (0/4500)
    pid    98373 (cpptools)         on cpu  0 hit rate 12% (600/5000)
    pid     3806 (gmain)            on cpu  0 hit rate 20% (600/3000)
    pid   782033 (docker)           on cpu  2 hit rate 0% (0/78500)
    pid       17 (rcu_preempt)      on cpu  2 hit rate 66.44% (9700/14600)
    pid   723596 (kworker/u9:1)     on cpu  0 hit rate 58.14% (2500/4300)
    pid    97663 (code)             on cpu  2 hit rate 0% (0/11500)
    pid     1590 (containerd)       on cpu  2 hit rate 5.967% (2500/41900)
    pid    97674 (code)             on cpu  0 hit rate 34.04% (6400/18800)
    pid     1127 (gmain)            on cpu  1 hit rate 0% (0/2800)
    pid   647386 (kworker/3:0)      on cpu  3 hit rate 76.49% (104100/136100)
    pid     4302 (HangWatcher)      on cpu  1 hit rate 21.43% (600/2800)
    pid     4171 (gmain)            on cpu  3 hit rate 0% (0/6400)
    pid     1613 (containerd)       on cpu  2 hit rate 36.92% (2400/6500)
    pid    21520 (cpptools)         on cpu  2 hit rate 24.78% (2800/11300)
    pid    98381 (cpptools)         on cpu  2 hit rate 21.88% (2100/9600)
    pid     8526 (code)             on cpu  2 hit rate 51.38% (5600/10900)
    pid   760435 (bluetoothd)       on cpu  1 hit rate 10.54% (6300/59800)
    pid   781990 (kworker/u9:2)     on cpu  2 hit rate 87.27% (9600/11000)
    pid   548101 (cpptools-srv)     on cpu  2 hit rate 33.33% (800/2400)
    pid   223016 (chrome)           on cpu  2 hit rate 0% (0/5100)
    pid     7843 (code)             on cpu  3 hit rate 48.76% (5900/12100)
    pid    97446 (code)             on cpu  0 hit rate 0% (0/124300)
    pid   779807 (kworker/u9:4)     on cpu  3 hit rate 16.24% (4400/27100)
    pid    97685 (code)             on cpu  0 hit rate 61.76% (4200/6800)
    pid     7843 (code)             on cpu  1 hit rate 0% (0/5000)
    pid   530384 (cpptools)         on cpu  3 hit rate 91.3% (2100/2300)
    pid   782023 (docker)           on cpu  2 hit rate 0% (0/11000)
    pid   723596 (kworker/u9:1)     on cpu  2 hit rate 71.65% (73800/103000)
    pid     3912 (gmain)            on cpu  1 hit rate 0% (0/600)
    pid        0 (swapper/2)        on cpu  2 hit rate 79.31% (1658100/2090600)
    pid    21684 (cpptools-srv)     on cpu  3 hit rate 28.57% (1600/5600)
    pid   781466 (DartWorker)       on cpu  2 hit rate 0% (0/6500)
    pid     3463 (snap-store)       on cpu  2 hit rate 0% (0/3700)
    pid     1584 (containerd)       on cpu  3 hit rate 49.72% (8900/17900)
    pid     7886 (code)             on cpu  0 hit rate 46.15% (600/1300)
    pid     1584 (containerd)       on cpu  2 hit rate 60.89% (30200/49600)
    pid     1056 (gmain)            on cpu  0 hit rate 67.74% (2100/3100)
    pid   782029 (docker)           on cpu  1 hit rate 0% (0/19600)
    pid   552090 (cpptools-srv)     on cpu  1 hit rate 60% (3000/5000)
    pid   782013 (docker)           on cpu  3 hit rate 0% (0/85000)
    pid     2359 (nmbd)             on cpu  0 hit rate 0% (0/2300)
    pid     4310 (HangWatcher)      on cpu  2 hit rate 12.5% (400/3200)
    pid   782044 (docker)           on cpu  1 hit rate 0% (0/5300)
    pid   780275 (kworker/u8:3)     on cpu  2 hit rate 15.79% (1500/9500)
    pid     3226 (gnome-shell)      on cpu  3 hit rate 53.48% (514000/961100)
    pid      626 (irq/132-iwlwifi)  on cpu  3 hit rate 0% (0/10300)
    pid   548101 (cpptools-srv)     on cpu  3 hit rate 0% (0/1800)
    pid   782041 (docker)           on cpu  0 hit rate 36.78% (3200/8700)
    pid     7885 (code)             on cpu  2 hit rate 0% (0/5000)
    pid      118 (kworker/1:1H)     on cpu  1 hit rate 0% (0/10000)
    pid     8503 (code)             on cpu  3 hit rate 0% (0/3400)
    pid   545505 (cpptools-srv)     on cpu  2 hit rate 14.71% (500/3400)
    pid    21528 (cpptools)         on cpu  0 hit rate 34.76% (5700/16400)
    pid     7888 (code)             on cpu  3 hit rate 0% (0/5000)
    pid   782020 (code)             on cpu  1 hit rate 0.7194% (300/41700)
    pid   756219 (kworker/u8:2)     on cpu  2 hit rate 0% (0/1200)
    pid     7914 (Chrome_ChildIOT)  on cpu  0 hit rate 0% (0/3700)
    pid   782015 (docker)           on cpu  2 hit rate 78% (3900/5000)
    pid   782029 (docker)           on cpu  2 hit rate 0% (0/3000)
    pid     4273 (HangWatcher)      on cpu  0 hit rate 72.73% (4000/5500)
    pid   545775 (cpptools-srv)     on cpu  2 hit rate 73.33% (5500/7500)
    pid   782033 (docker)           on cpu  0 hit rate 0% (0/35000)
    pid   782042 (docker)           on cpu  3 hit rate 28.23% (5900/20900)
    pid   782037 (docker)           on cpu  0 hit rate 27.37% (2600/9500)
    pid   782015 (docker)           on cpu  3 hit rate 74.07% (2000/2700)
    pid    97845 (code)             on cpu  2 hit rate 65.79% (5000/7600)
    pid   782034 (docker)           on cpu  0 hit rate 67.35% (6600/9800)
    pid     3256 (gnome-shell)      on cpu  3 hit rate 18.69% (4000/21400)
    pid   782025 (docker)           on cpu  0 hit rate 0% (0/1500)
    pid     1010 (irqbalance)       on cpu  2 hit rate 5.426% (700/12900)
    pid   782027 (docker)           on cpu  3 hit rate 0.6472% (400/61800)
    
    

    里面最核心的函数如下:

      StatusTuple attach_perf_event(uint32_t ev_type, uint32_t ev_config,
                                    const std::string& probe_func,
                                    uint64_t sample_period, uint64_t sample_freq,
                                    pid_t pid = -1, int cpu = -1,
                                    int group_fd = -1);
      StatusTuple detach_perf_event(uint32_t ev_type, uint32_t ev_config);
    

    接下来我们再看一个例子,这个例子是统计启动最多的进程名称,这次会用到 perf buffer, 代码如下:

    #include <string>
    #include <iostream>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fstream>
    #include <signal.h>
    
    #include <bcc/BPF.h>
    #include <linux/perf_event.h>
    #include <iomanip>
    
    const std::string BPF_PROGRAM = R"(
    
    #include <linux/ptrace.h>
    
    BPF_PERF_OUTPUT(events);
    
    int do_sys_execve(struct pt_regs *ctx, void *filename, void *argv, void *envp) {
      char comm[16] = {0};
      bpf_get_current_comm(comm, sizeof(comm));
      events.perf_submit(ctx, comm, sizeof(comm));
      return 0;
    };
    
    )";
    
    std::map<std::string, int64_t> exec_map;
    
    bool cancel = false;
    
    void signal_handler(int s) {
      std::cout<<"receive cancel signal" << std::endl;
      cancel = true;
    }
    
    void handle_output(void *cb, void *data, int data_size) {
      char *comm = static_cast<char *>(data);
      exec_map[comm] += 1;
    }
    
    int main(){
      ebpf::BPF bpf;
      auto init_res = bpf.init(BPF_PROGRAM);
      if (!init_res.ok()) {
        std::cout<<init_res.msg() << std::endl;
        return 1;
      }
      std::cout<<"bpf init ok"<<std::endl;
      std::string fnname = bpf.get_syscall_fnname("execve");
      auto res = bpf.attach_kprobe(fnname, "do_sys_execve");
      if (!res.ok()) {
        std::cout<<res.msg()<<std::endl;
        return 1;
      }
      res = bpf.open_perf_buffer("events", &handle_output);
      if (!res.ok()) {
        std::cout<<res.msg()<<std::endl;
        return 1;
      }
    
      std::cout<<"open perf buffer ok"<<std::endl;
      signal(SIGINT, signal_handler);
      while (!cancel) {
        bpf.poll_perf_buffer("events");
      }
    
      for (auto [key, value]: exec_map) {
        std::cout<<key << " " << value<< std::endl;
      }
      return 0;
    }
    

    输出如下:

    bpf init ok
    open perf buffer ok
    ^Creceive cancel signal
    bash 3328
    code 220
    

    这儿的open_perf_buffer需要介绍下,我们知道在性能分析的时候如果需要从内核往用户态传输大量的数据,这时候的性能就会很差,bpf是通过perf buffer解决的这个问题,当创建perf events buffer后,内核每发生一次对应的事件,都会激活用户态注册的回调,用户态的回调负责统计,这样就不需要传输大量数据了,因为统计数据本来就在用户态,不需要从内核态发送给用户态了。
    poll_perf_buffer 就类似一个io多路复用,阻塞到该buffer上,等待时间的发生。每发生一次事件,都会被唤醒一次。这儿的case中,我们是注册了中断信号,运行后,等待段时间,按下ctrl c,就可以知道哪个应用被创建的次数最多了。

    相关文章

      网友评论

          本文标题:ebpf学习(3)

          本文链接:https://www.haomeiwen.com/subject/qkfiljtx.html