eBPF is a robust technique that provides efficient and dynamic event tracing within an operating system's kernel area. We'll take a deep dive into the realm of eBPF (extended Berkeley Packet Filter) and its critical role in system call tracing in this article.
BPF, short for Berkeley Packet Filter made significant strides when it was introduced in the Linux kernel. It evolved from a simple packet filtering tool into a more versatile and powerful framework for various network-related tasks. This evolution was driven by the need for efficient packet processing and analysis in modern network environments.
eBPF, or Extended Berkeley Packet Filter, took BPF to a whole new level. It enables sandboxed programs to operate within the operating system environment. This means that application developers have the capability to run eBPF programs, thereby enhancing the operating system's functionality in real-time. The operating system ensures both safety and execution efficiency, akin to natively compiled code, thanks to the integration of a Just-In-Time (JIT) compiler and verification engine.
Compilation: A compiler converts the eBPF program into bytecode, which is then loaded by a loader software.
Verification: The eBPF verifier examines the program for safety, accuracy, and adherence to certain rules and restrictions. It guarantees that the software does not break memory access restrictions or create instability.
Loading: The eBPF software can be loaded into the kernel after it has been confirmed. The loader guarantees that the software is securely loaded and hooked to the system's intended hooks or targets.
Optimization: JIT (Just-in-time) compilation is used to further optimize the eBPF bytecode during runtime. This phase turns the eBPF bytecode into machine code that the CPU can execute.
eBPF applications are rigorously validated, adding an extra degree of protection against security flaws. Kernel modules, on the other hand, have direct access to kernel code and may pose a hazard if not designed appropriately.
JIT compilation of eBPF programs into machine code results in better performance suited to the CPU architecture. However, due to the absence of calls to the BPF subsystem in the latter, eBPF instrumentation imposes greater overhead in the system than kernel module instrumentation.
eBPF supports comprehensive tracing and observability. Programs can be linked to a variety of events, such as network kernel operations, system calls or packets, or to provide deep insights into system activity. As a result, eBPF is an excellent tool for performance analysis, security monitoring, and debugging. Kernel modules, on the other hand, frequently need more complicated and invasive techniques to attain comparable observability.
The Linux kernel defines a number of instrumentation places where additional code can be inserted to gather data on program execution. JIT compilation enables the injection of instrumentation code at runtime. Instrumentation points include tracepoints, kernel probes, user-space probes, and kretprobes.
For example, here's an eBPF application that runs when the execute system call is executed.
The SEC() macro from the bpf/bpf_helper.h header file is critical in the context of eBPF programming. It allows a programmer to select the part of the eBPF object file in which a function or variable will be inserted. This is necessary by using techniques such as the bpf() system function into the kernel for loading eBPF programs.
The loader of eBPF can rapidly discover and load the relevant data and code by arranging functions and variables into named sections. When it comes to tracepoint occurrences,, the SEC format follows the pattern SEC("tp/category>/name>"), where category> and name> denote the tracepoint event name and category, respectively.
For example, tp/syscalls/sys_enter_execve is a tracepoint that captures when a process calls the execve system call.
The file /sys/kernel/debug/tracing/available_events contains a list of available tracepoints. Each line in the file follows the format <category>:<name>, such as syscalls:sys_enter_execve.
Use following command to compile the program:
As a result, a loader program is required to load and attach this software. The loader program is in charge of opening and loading the eBPF object file, checking for faults, locating and attaching a specific eBPF program within the loaded object, and closing it. When the eBPF application is connected, it executes when certain events occur. The software begins an indefinite loop, guaranteeing that it runs until halted manually.
To retrieve logs generated by the bpf_printk function, you can read the file: /sys/kernel/tracing/trace_pipe.
However, from the tracepipe manually reading messages may not be the most efficient approach. Establishing a mechanism for the eBPF program, and messages sent to the loader program is advantageous.There is one possible solution to use ring buffers. Let's delve into the details of ring buffers.
This ring buffer facilitates the transmission of events and data between user-space applications and eBPF programs running in the kernel. It works as a Multi Producer Single Consumer (MPSC) queue, allowing for safe concurrent sharing across several CPUs.
All CPUs share the ring buffer of eBPF, which offers an efficient method of controlling memory usage, addressing issues with perfbuf including overuse and under-allocation.
Here are a few key functions for writing an eBPF application that transfers data to user-space via a ring buffer:
This function reserves a specified amount of space (in bytes) in a BPF ring buffer.
This function reads a null-terminated string from user-space memory into the destination (dst) buffer in the kernel space.
This function submits data that was previously reserved in a ring buffer.
This function finds the file descriptor of a named map.
This function attaches an eBPF program to a kernel tracepoint.
This function is used to create and open a new ring buffer manager.
This program is used to remove or consume data from a ring buffer.
BTF provides a means to describe the types of data structures used by eBPF programs. This enhances type safety, debugging, and introspection.
Now, let's create a program that sends data to user-space using a ring buffer:
Once the program is created, you can write a loader to load this eBPF program:
The program's endless loop is critical for continually checking for new events in the ring buffer. Without the loop, the application would only consume events that were in the buffer when the ring_buffer__consume() call was made. The application may collect events as soon as they become available and handle them in real time by looping and using ring_buffer__consume() periodically. Within the loop, the sleep(1) function minimizes CPU consumption by imposing a one-second delay between each ring_buffer__consume() call.
After constructing and running the program, you'll see the results, including the recovery of the process name and PID.
Finally, this article has provided a full understanding of the eBPF (extended Berkeley Packet Filter) and its critical role in system call tracing. We've discussed the move from BPF to eBPF, why Falco utilizes it, and how to work with eBPF programs and ring buffers to ensure efficient data transfer between the user-space applications and kernel.
We noticed that eBPF beats traditional kernel modules in terms of performance, observability, and safety as we studied its possibilities. eBPF allows us to monitor and evaluate system calls in real time in a secure and effective manner, making it an essential tool in today's cloud-native systems.
This was originally posted on Falco blog by our Engineers Ashutosh More and Rakshit Awasthi