HomeEtherumFinding and Preventing Vulnerabilities in C Programming: A Guide to Building Robust...

Finding and Preventing Vulnerabilities in C Programming: A Guide to Building Robust Code

Published on

For EIP-4844, Ethereum clients need the ability to compute and verify KZG commitments. Rather than each client rolling their own crypto, researchers and developers came together to write c-kzg-4844, a relatively small C library with bindings for higher-level languages. The idea was to create a robust and efficient cryptographic library that all clients could use.

The Protocol Security Research team at the Ethereum Foundation had the opportunity to review and improve this library. This blog post will discuss some things we do to make C projects more secure.

Fuzz
Fuzzing is a dynamic code testing technique that involves providing random inputs to discover bugs in a program. LibFuzzer and afl++ are two popular fuzzing frameworks for C projects. They are both in-process, coverage-guided, evolutionary fuzzing engines. For c-kzg-4844, we used LibFuzzer since we were already well-integrated with LLVM project’s other offerings. Here’s the fuzzer for verify_kzg_proof, one of c-kzg-4844’s functions:

“`
#include “../base_fuzz.h”
static const size_t COMMITMENT_OFFSET = 0;
static const size_t Z_OFFSET = COMMITMENT_OFFSET + BYTES_PER_COMMITMENT;
static const size_t Y_OFFSET = Z_OFFSET + BYTES_PER_FIELD_ELEMENT;
static const size_t PROOF_OFFSET = Y_OFFSET + BYTES_PER_FIELD_ELEMENT;
static const size_t INPUT_SIZE = PROOF_OFFSET + BYTES_PER_PROOF;

int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
initialize();
if (size == INPUT_SIZE) {
bool ok;
verify_kzg_proof(
&ok,
(const Bytes48 *)(data + COMMITMENT_OFFSET),
(const Bytes32 *)(data + Z_OFFSET),
(const Bytes32 *)(data + Y_OFFSET),
(const Bytes48 *)(data + PROOF_OFFSET),
&s
);
}
return 0;
}
“`

When executed, this is what the output looks like. If there were a problem, it would write the input to disk and stop executing. Ideally, you should be able to reproduce the problem.

There’s also differential fuzzing, which is a technique that fuzzes two or more implementations of the same interface and compares the outputs. For a given input, if the output is different, and you expected them to be the same, you know something is wrong. This technique is very popular in Ethereum because we like to have several implementations of the same thing. This diversification provides an extra level of safety, knowing that if one implementation were flawed the others may not have the same issue. For KZG libraries, we developed kzg-fuzz, which differentially fuzzes c-kzg-4844 (through its Golang bindings) and go-kzg-4844. So far, there haven’t been any differences.

Coverage
Next, we used llvm-profdata and llvm-cov to generate a coverage report from running the tests. This is a great way to verify code is executed (“covered”) and tested. See the coverage target in c-kzg-4844’s Makefile for an example of how to generate this report. When this target is run (i.e., make coverage), it produces a table that serves as a high-level overview of how much of each function is executed. The exported functions are at the top and the non-exported (static) functions are on the bottom. There is a lot of green in the table above, but there is some yellow and red too. To determine what is and isn’t being executed, refer to the HTML file (coverage.html) that was generated. This webpage shows the entire source file and highlights non-executed code in red. In this project’s case, most of the non-executed code deals with hard-to-test error cases such as memory allocation failures. For example, here’s some non-executed code:

“`c
if (!is_monomial_form(unsafe, PROOF_DEGREE + 1)) {
return -1;
}
“`

At the beginning of this function, it checks that the trusted setup is big enough to perform a pairing check. There isn’t a test case which provides an invalid trusted setup, so this doesn’t get executed. Also, because we only test with the correct trusted setup, the result of is_monomial_form is always the same and doesn’t return the error value.

Profile
We don’t recommend this for all projects, but since c-kzg-4844 is a performance critical library we think it’s important to profile its exported functions and measure how long they take to execute. This can help identify inefficiencies which could potentially DoS nodes. For this, we used gperftools (Google Performance Tools) instead of llvm-xray because we found it to be more feature-rich and easier to use. The following is a simple example which profiles my_function. Profiling works by checking which instruction is being executed every so often. If a function is fast enough, it may not be noticed by the profiler. To reduce the chance of this, you may need to call your function multiple times. In this example, we call my_function 1000 times.

“`c
#include

int task_a(int n) {
if (n <= 1) return 1; return task_a(n - 1) * n; } int task_b(int n) { if (n <= 1) return 1; return task_b(n - 2) + n; } void my_function(void) { for (int i = 0; i < 500; i++) { if (i % 2 == 0) { task_a(i); } else { task_b(i); } } } int main(void) { ProfilerStart("example.prof"); for (int i = 0; i < 1000; i++) { my_function(); } ProfilerStop(); return 0; } ``` Use ProfilerStart("“) and ProfilerStop() to mark which parts of your program to profile. When re-compiled and executed, it will write a file to disk with profiling data. You can then use pprof to visualize this data.

Reverse
Next, view your binary in a software reverse engineering (SRE) tool such as Ghidra or IDA. These tools can help you understand how high-level constructs are translated into low-level machine code. We think it helps to review your code this way; like how reading a paper in a different font will force your brain to interpret sentences differently. It’s also useful to see what type of optimizations your compiler makes. It’s rare, but sometimes the compiler will optimize out something which it deemed unnecessary. Keep an eye out for this, something like this actually happened in c-kzg-4844, some of the tests were being optimized out.

When you view a decompiled function, it will not have variable names, complex types, or comments. When compiled, this information isn’t included in the binary. It will be up to you to reverse engineer this. You’ll often see functions are inlined into a single function, multiple variables declared in code are optimized into a single buffer, and the order of checks are different. These are just compiler optimizations and are generally fine. It may help to build your binary with DWARF debugging information; most SREs can analyze this section to provide better results. For example, this is what blob_to_kzg_commitment initially looks like in Ghidra:

![Decompiled code](https://example.com/decompiled.png)

With a little work, you can rename variables and add comments to make it easier to read. Here’s what it could look like after a few minutes:

“`c
MerkleTreeIdentifier blob_to_kzg_commitment(
const TrustedSetup& setup,
const std::vector& blob_path,
const std::vector& blob_hash_path
) {
MerkleTreeIdentifier commitment;

for (size_t i = 0; i < blob_path.size(); i++) { const G1Point& node = blob_path[i]; const Bytes48& hashed_node = blob_hash_path[i]; if (!is_compromised(setup, hashed_node)) { commitment.push_back(node); } } return commitment; } ``` Static Analysis Clang comes built-in with the Clang Static Analyzer, which is an excellent static analysis tool that can identify many problems that the compiler will miss. As the name "static" suggests, it examines code without executing it. This is slower than the compiler, but a lot faster than "dynamic" analysis tools which execute code. Here's a simple example which forgets to free arr (and has another problem but we will talk more about that later). The compiler will not identify this, even with all warnings enabled because technically this is completely valid code. ```c #include

int main(void) {
int* arr = malloc(5 * sizeof(int));
arr[5] = 42;
return 0;
}
“`

The unix.Malloc checker will identify that arr wasn’t freed. The line in the warning message is a bit misleading, but it makes sense if you think about it; the analyzer reached the return statement and noticed that the memory hadn’t been freed. Not all of the findings are that simple though. Here’s a finding that Clang Static Analyzer found in c-kzg-4844 when initially introduced to the project:

![Clang Static Analyzer finding](https://example.com/static_analysis.png)

Given an unexpected input, it was possible to shift this value by 32 bits which is undefined behavior. The solution was to restrict the input with CHECK(log2_pow2(n) != 0) so that this was impossible. Good job, Clang Static Analyzer!

Sanitize
Sanitizers are dynamic analysis tools which instrument (add instructions) to programs which can point out issues during

Latest articles

Analyst Suggests Ethereum Price Could Surge to $3,100 with Bullish Momentum

Ethereum (ETH), the second-largest cryptocurrency, has seen a significant price increase over the past...

Cboe Digital Announces Plan to Introduce Margin Futures Trading for Bitcoin and Ethereum by 2024

On Nov. 13, Cboe Digital announced that it will soon launch trading and clearing...

Top Trader Predicts Significant Price Increases for Sushi and Apecoin – Check Out His Projections

A crypto strategist says more rallies are up ahead for prominent decentralized exchange (DEX)...

What is the difference between NVMe and M.2?

`` Over the last decade, solid-state drives (SSDs) have become the top choice for many...

More like this

Analyst Suggests Ethereum Price Could Surge to $3,100 with Bullish Momentum

Ethereum (ETH), the second-largest cryptocurrency, has seen a significant price increase over the past...

Cboe Digital Announces Plan to Introduce Margin Futures Trading for Bitcoin and Ethereum by 2024

On Nov. 13, Cboe Digital announced that it will soon launch trading and clearing...

Top Trader Predicts Significant Price Increases for Sushi and Apecoin – Check Out His Projections

A crypto strategist says more rallies are up ahead for prominent decentralized exchange (DEX)...
bitcoin
Bitcoin (BTC) $ 51,080.69 0.29%
ethereum
Ethereum (ETH) $ 2,959.18 0.91%
tether
Tether (USDT) $ 1.00 0.08%
bnb
BNB (BNB) $ 379.60 0.87%
solana
Solana (SOL) $ 102.53 0.62%
xrp
XRP (XRP) $ 0.544309 1.98%
staked-ether
Lido Staked Ether (STETH) $ 2,954.26 0.71%
usd-coin
USDC (USDC) $ 1.00 0.17%
cardano
Cardano (ADA) $ 0.586128 0.87%
avalanche-2
Avalanche (AVAX) $ 36.42 0.06%