Advanced System Call Tracing Using eBPF: A Comprehensive Guide

by Ashutosh More

eBPF is a robust technique that provides efficient and dynamic event tracing within an operating system’s kernel area. We’ll take a deep dive into the realm of eBPF (Extended Berkeley Packet Filter) and its critical role in system call tracing in this article.

What is BPF?

BPF, short for Berkeley Packet Filter made significant strides when it was introduced in the Linux kernel. It evolved from a simple packet filtering tool into a more versatile and powerful framework for various network-related tasks. This evolution was driven by the need for efficient packet processing and analysis in modern network environments.

What is eBPF?

eBPF, or Extended Berkeley Packet Filter, took BPF to a whole new level. It enables sandboxed programs to operate within the operating system environment. This means that application developers have the capability to run eBPF programs, thereby enhancing the operating system’s functionality in real-time. The operating system ensures both safety and execution efficiency, akin to natively compiled code, thanks to the integration of a Just-In-Time (JIT) compiler and verification engine.

Developing an eBPF Program

Developing and implementing an eBPF program involves several unique steps:

Compilation: A compiler converts the eBPF program into bytecode, which is then loaded by a loader software.

Verification: The eBPF verifier examines the program for safety, accuracy, and adherence to certain rules and restrictions. It guarantees that the software does not break memory access restrictions or create instability.

Loading: The eBPF software can be loaded into the kernel after it has been confirmed. The loader guarantees that the software is securely loaded and hooked to the system’s intended hooks or targets.

Optimization: JIT (Just-in-time) compilation is used to further optimize the eBPF bytecode during runtime. This phase turns the eBPF bytecode into machine code that the CPU can execute.

Kernel Modules

Apart from eBPF, kernel modules serve as an alternative approach to enable process tracing within the Linux environment. Developers harness these modules to create customized code that extends the functionality of the kernel.
Presently, it is viable to intercept specific points within the kernel’s process management code, thereby acquiring precise details concerning process execution. This is achieved through the utilization of kernel modules, encompassing events like process inception, termination, and context transitions.
These modules provide valuable insights by accessing the kernel’s fundamental data structures and functionalities, encompassing elements such as process identifiers (PIDs), parent-child relationships, execution durations, system calls, and more.

Kernel Modules vs. eBPF Programs

Isolation and Security

eBPF applications are rigorously validated, adding an extra degree of protection against security flaws. Kernel modules, on the other hand, have direct access to kernel code and may pose a hazard if not designed appropriately.

Performance

JIT compilation of eBPF programs into machine code results in better performance suited to the CPU architecture. However, due to the absence of calls to the BPF subsystem in the latter, eBPF instrumentation imposes greater overhead in the system than kernel module instrumentation.

Debugging and Monitoring

eBPF supports comprehensive tracing and observability. Programs can be linked to a variety of events, such as network kernel operations, system calls or packets, or to provide deep insights into system activity. As a result, eBPF is an excellent tool for performance analysis, security monitoring, and debugging. Kernel modules, on the other hand, frequently need more complicated and invasive techniques to attain comparable observability.

To Events and Hooks Attached eBPF Programs

The Linux kernel defines a number of instrumentation places where additional code can be inserted to gather data on program execution. JIT compilation enables the injection of instrumentation code at runtime. Instrumentation points include tracepoints, kernel probes, user-space probes, and kretprobes.

For example, here’s an eBPF application that runs when the execute system call is executed.

To Events and Hooks Attached eBPF Programs

The SEC() macro from the bpf/bpf_helper.h header file is critical in the context of eBPF programming. It allows a programmer to select the part of the eBPF object file in which a function or variable will be inserted. This is necessary by using techniques such as the bpf() system function into the kernel for loading eBPF programs.

The loader of eBPF can rapidly discover and load the relevant data and code by arranging functions and variables into named sections. When it comes to tracepoint occurrences,, the SEC format follows the pattern SEC(“tp/category>/name>”), where category> and name> denote the tracepoint event name and category, respectively.

For example, tp/syscalls/sys_enter_execve is a tracepoint that captures when a process calls the execve system call.

The file /sys/kernel/debug/tracing/available_events contains a list of available tracepoints. Each line in the file follows the format <category>:<name>, such as syscalls:sys_enter_execve.

Basic configurations

Command to compile the program

Loader program for eBPF

As a result, a loader program is required to load and attach this software. The loader program is in charge of opening and loading the eBPF object file, checking for faults, locating and attaching a specific eBPF program within the loaded object, and closing it. When the eBPF application is connected, it executes when certain events occur. The software begins an indefinite loop, guaranteeing that it runs until halted manually.

Command to compile loader program for eBPF

Command to compile and run the program

To retrieve logs generated by the bpf_printk function, you can read the file: /sys/kernel/tracing/trace_pipe.

However, from the tracepipe manually reading messages may not be the most efficient approach. Establishing a mechanism for the eBPF program, and messages sent to the loader program is advantageous.There is one possible solution to use ring buffers. Let’s delve into the details of ring buffers.

Ring Buffers in an eBPF application

This ring buffer facilitates the transmission of events and data between user-space applications and eBPF programs running in the kernel. It works as a Multi Producer Single Consumer (MPSC) queue, allowing for safe concurrent sharing across several CPUs.

All CPUs share the ring buffer of eBPF, which offers an efficient method of controlling memory usage, addressing issues with perfbuf including overuse and under-allocation.

Here are a few key functions for writing an eBPF application that transfers data to user-space via a ring buffer:

Reserves space: bpf_ringbuf_reserve

This function reserves a specified amount of space (in bytes) in a BPF ring buffer.

Reads null-terminated string: bpf_probe_read_user_str

This function reads a null-terminated string from user-space memory into the destination (dst) buffer in the kernel space.

Submit data: bpf_ringbuf_submit

This function submits data that was previously reserved in a ring buffer.

Finds file descriptor: bpf_object__find_map_fd_by_name

This function finds the file descriptor of a named map.

Attaches an eBPF program: bpf_program__attach_tracepoint

This function attaches an eBPF program to a kernel tracepoint.

Create and open: ring_buffer__new

This function is used to create and open a new ring buffer manager.

Remove or consume data: ring_buf__consume

This program is used to remove or consume data from a ring buffer.

BPF Type Format – BTF

BTF provides a means to describe the types of data structures used by eBPF programs. This enhances type safety, debugging, and introspection.

Create a Program

Now, let’s create a program that sends data to user-space using a ring buffer:

Load eBPF program

Once the program is created, you can write a loader to load this eBPF program:

The program’s endless loop is critical for continually checking for new events in the ring buffer. Without the loop, the application would only consume events that were in the buffer when the ring_buffer__consume() call was made. The application may collect events as soon as they become available and handle them in real time by looping and using ring_buffer__consume() periodically. Within the loop, the sleep(1) function minimizes CPU consumption by imposing a one-second delay between each ring_buffer__consume() call.

Commands to Compile and run

After constructing and running the program, you’ll see the results, including the recovery of the process name and PID.

Conclusion

Finally, this article has provided a full understanding of the eBPF (Extended Berkeley Packet Filter) and its critical role in system call tracing. We’ve discussed the move from BPF to eBPF, why Falco utilizes it, and how to work with eBPF programs and ring buffers to ensure efficient data transfer between the user-space applications and kernel.

We noticed that eBPF beats traditional kernel modules in terms of performance, observability, and safety as we studied its possibilities. eBPF allows us to monitor and evaluate system calls in real time in a secure and effective manner, making it an essential tool in today’s cloud-native systems.

This was originally posted on Falco blog by our Engineers Ashutosh More and Rakshit Awasthi. You can also read our next article eBPF for System Calls Tracing: Part 2.