Browse Source

Initial OSS-Fuzz Integration and First Fuzzing Test

Introduces an initial fuzzing test and supporting files for
integrating Dulwich into OSS-Fuzz as discussed in:
https://github.com/jelmer/dulwich/issues/1302

The corresponding PR on the OSS-Fuzz repo is:
https://github.com/google/oss-fuzz/pull/11900
David Lakin 10 months ago
parent
commit
35792cdd46

+ 2 - 0
.gitignore

@@ -27,3 +27,5 @@ docs/api/*.txt
 dulwich.dist-info
 .stestr
 target/
+# Files created by OSS-Fuzz when running locally
+fuzz_*.pkg.spec

+ 2 - 0
NEWS

@@ -4,6 +4,8 @@
 
  * Ship ``tests/`` and ``testdata/`` in sdist. (Jelmer Vernooij, #1292)
 
+ * Add initial integration with OSS-Fuzz for continuous fuzz testing and first fuzzing test (David Lakin, #1302)
+
 0.22.1	2024-04-23
 
  * Handle alternate case for worktreeconfig setting (Will Shanks, #1285)

+ 190 - 0
fuzzing/README.md

@@ -0,0 +1,190 @@
+# Fuzzing Dulwich
+
+[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/dulwich.svg)][oss-fuzz-issue-tracker]
+
+This directory contains files related to Dulwich's suite of fuzz tests that are executed daily on automated
+infrastructure provided by [OSS-Fuzz][oss-fuzz-repo]. This document aims to provide necessary information for working
+with fuzzing in Dulwich.
+
+The latest details regarding OSS-Fuzz test status, including build logs and coverage reports, is available
+on [the Open Source Fuzzing Introspection website](https://introspector.oss-fuzz.com/project-profile?project=dulwich).
+
+## How to Contribute
+
+There are many ways to contribute to Dulwich's fuzzing efforts! Contributions are welcomed through issues,
+discussions, or pull requests on this repository.
+
+Areas that are particularly appreciated include:
+
+- **Tackling the existing backlog of open issues**. While fuzzing is an effective way to identify bugs, that information
+  isn't useful unless they are fixed. If you are not sure where to start, the issues tab is a great place to get ideas!
+- **Improvements to this (or other) documentation** make it easier for new contributors to get involved, so even small
+  improvements can have a large impact over time. If you see something that could be made easier by a documentation
+  update of any size, please consider suggesting it!
+
+For everything else, such as expanding test coverage, optimizing test performance, or enhancing error detection
+capabilities, jump into the "Getting Started" section below.
+
+## Getting Started with Fuzzing Dulwich
+
+> [!TIP]
+> **New to fuzzing or unfamiliar with OSS-Fuzz?**
+>
+> These resources are an excellent place to start:
+>
+> - [OSS-Fuzz documentation][oss-fuzz-docs] - Continuous fuzzing service for open source software.
+> - [Google/fuzzing][google-fuzzing-repo] - Tutorials, examples, discussions, research proposals, and other resources
+    related to fuzzing.
+> - [CNCF Fuzzing Handbook](https://github.com/cncf/tag-security/blob/main/security-fuzzing-handbook/handbook-fuzzing.pdf) -
+    A comprehensive guide for fuzzing open source software.
+> - [Efficient Fuzzing Guide by The Chromium Project](https://chromium.googlesource.com/chromium/src/+/main/testing/libfuzzer/efficient_fuzzing.md) -
+    Explores strategies to enhance the effectiveness of your fuzz tests, recommended for those looking to optimize their
+    testing efforts.
+
+### Setting Up Your Local Environment
+
+Before contributing to fuzzing efforts, ensure Python and Docker are installed on your machine. Docker is required for
+running fuzzers in containers provided by OSS-Fuzz. [Install Docker](https://docs.docker.com/get-docker/) following the official guide if you do not already have it.
+
+### Understanding Existing Fuzz Targets
+
+Review the `fuzz-targets/` directory to familiarize yourself with how existing tests are implemented. See
+the [Files & Directories Overview](#files--directories-overview) for more details on the directory structure.
+
+### Contributing to Fuzz Tests
+
+Start by reviewing the [Atheris documentation][atheris-repo] and the section
+on [Running Fuzzers Locally](#running-fuzzers-locally) to begin writing or improving fuzz tests.
+
+## Files & Directories Overview
+
+The `fuzzing/` directory is organized into three key areas:
+
+### Fuzz Targets (`fuzz-targets/`)
+
+Contains Python files for each fuzz test.
+
+**Things to Know**:
+
+- Each fuzz test targets a specific part of Dulwich's functionality.
+- Test files adhere to the naming convention: `fuzz_<API Under Test>.py`, where `<API Under Test>` indicates the
+  functionality targeted by the test.
+- Any functionality that involves performing operations on input data is a possible candidate for fuzz testing, but
+  features that involve processing untrusted user input or parsing operations are typically going to be the most
+  interesting.
+- The goal of these tests is to identify previously unknown or unexpected error cases caused by a given input. For that
+  reason, fuzz tests should gracefully handle anticipated exception cases with a `try`/`except` block to avoid false
+  positives that halt the fuzzing engine.
+
+### Dictionaries (`dictionaries/`)
+
+Provides hints to the fuzzing engine about inputs that might trigger unique code paths. Each fuzz target may have a
+corresponding `.dict` file. For information about dictionary syntax, refer to
+the [LibFuzzer documentation on the subject](https://llvm.org/docs/LibFuzzer.html#dictionaries).
+
+**Things to Know**:
+
+- OSS-Fuzz loads dictionary files per fuzz target if one exists with the same name, all others are ignored.
+- Most entries in the dictionary files found here are escaped byte values that were recommended by the fuzzing
+  engine after previous runs.
+- A default set of dictionary entries are created for all fuzz targets as part of the build process, regardless of an
+  existing file here.
+- Development or updates to dictionaries should reflect the varied formats and edge cases relevant to the
+  functionalities under test.
+- Example dictionaries (some of which are used to build the default dictionaries mentioned above) can be found here:
+  - [AFL++ dictionary repository](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries#readme)
+  - [Google/fuzzing dictionary repository](https://github.com/google/fuzzing/tree/master/dictionaries)
+
+### OSS-Fuzz Scripts (`oss-fuzz-scripts/`)
+
+Includes scripts for building and integrating fuzz targets with OSS-Fuzz:
+
+- **`container-environment-bootstrap.sh`** - Sets up the execution environment. It is responsible for fetching default
+  dictionary entries and ensuring all required build dependencies are installed and up-to-date.
+- **`build.sh`** - Executed within the Docker container, this script builds fuzz targets with necessary instrumentation
+  and prepares seed corpora and dictionaries for use.
+
+**Where to learn more:**
+
+- [OSS-Fuzz documentation on the build.sh](https://google.github.io/oss-fuzz/getting-started/new-project-guide/#buildsh)
+- [See Dulwich's build.sh and Dockerfile in the OSS-Fuzz repository](https://github.com/google/oss-fuzz/tree/master/projects/dulwich)
+
+## Running Fuzzers Locally
+
+This approach uses Docker images provided by OSS-Fuzz for building and running fuzz tests locally. It offers
+comprehensive features but requires a local clone of the OSS-Fuzz repository and sufficient disk space for Docker
+containers.
+
+### Build the Execution Environment
+
+Clone the OSS-Fuzz repository and prepare the Docker environment:
+
+```shell
+git clone --depth 1 https://github.com/google/oss-fuzz.git oss-fuzz
+cd oss-fuzz
+python infra/helper.py build_image dulwich
+python infra/helper.py build_fuzzers --sanitizer address dulwich
+```
+
+> [!TIP]
+> The `build_fuzzers` command above accepts a local file path pointing to your Dulwich repository clone as the last
+> argument.
+> This makes it easy to build fuzz targets you are developing locally in this repository without changing anything in
+> the OSS-Fuzz repo!
+> For example, if you have cloned this repository (or a fork of it) into: `~/code/dulwich`
+> Then running this command would build new or modified fuzz targets using the `~/code/dulwich/fuzzing/fuzz-targets`
+> directory:
+> ```shell
+> python infra/helper.py build_fuzzers --sanitizer address dulwich ~/code/dulwich
+> ```
+
+Verify the build of your fuzzers with the optional `check_build` command:
+
+```shell
+python infra/helper.py check_build dulwich
+```
+
+### Run a Fuzz Target
+
+Setting an environment variable for the fuzz target argument of the execution command makes it easier to quickly select
+a different target between runs:
+
+```shell
+# specify the fuzz target without the .py extension:
+export FUZZ_TARGET=fuzz_configfile
+```
+
+Execute the desired fuzz target:
+
+```shell
+python infra/helper.py run_fuzzer dulwich $FUZZ_TARGET -- -max_total_time=60 -print_final_stats=1
+```
+
+> [!TIP]
+> In the example above, the "`-- -max_total_time=60 -print_final_stats=1`" portion of the command is optional but quite
+> useful.
+>
+> Every argument provided after "`--`" in the above command is passed to the fuzzing engine directly. In this case:
+> - `-max_total_time=60` tells the LibFuzzer to stop execution after 60 seconds have elapsed.
+> - `-print_final_stats=1` tells the LibFuzzer to print a summary of useful metrics about the target run upon
+    completion.
+>
+> But almost any [LibFuzzer option listed in the documentation](https://llvm.org/docs/LibFuzzer.html#options) should
+> work as well.
+
+#### Next Steps
+
+For detailed instructions on advanced features like reproducing OSS-Fuzz issues or using the Fuzz Introspector, refer
+to [the official OSS-Fuzz documentation][oss-fuzz-docs].
+
+
+
+[oss-fuzz-repo]: https://github.com/google/oss-fuzz
+
+[oss-fuzz-docs]: https://google.github.io/oss-fuzz
+
+[oss-fuzz-issue-tracker]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:dulwich
+
+[google-fuzzing-repo]: https://github.com/google/fuzzing
+
+[atheris-repo]: https://github.com/google/atheris

+ 31 - 0
fuzzing/dictionaries/fuzz_configfile.dict

@@ -0,0 +1,31 @@
+"\\357\\273\\277"
+"\\\\\\015\\012"
+"\\001\\000"
+"\\000\\000\\000\\000"
+"\\001\\000\\000\\000"
+"\\377h"
+"-\\000\\000\\000\\000\\000\\000\\000"
+"[\\000\\000\\000\\000\\000\\000\\000"
+"H]\\000"
+"2\\000\\000\\000\\000\\000\\000\\000"
+"\\377\\377\\377\\377\\377\\377\\377;"
+"]\\377"
+"\\000\\000\\000\\000\\000\\000\\000B"
+"\\\\\\012"
+"\\000\\000\\000\\000\\000\\000\\0001"
+"rue"
+"b\\271\\""
+"\\000\\000\\000\\000\\000\\000\\000]"
+"\\\\\\000\\000\\000\\000\\000\\000\\000"
+"\\330\\330
+"\\000\\000\\000\\000\\000\\000\\000\\000"
+"\\377\\377\\377\\377"
+"%\\000\\000\\000\\000\\000\\000\\000"
+"\\000\\000\\000\\000\\000\\000\\000\\\\"
+"\\377\\377\\377\\377\\377\\377\\377$"
+"[\\000\\000\\000\\000\\000\\000\\000"
+"p\\012"
+"\\001\\000\\000\\000\\000\\000\\000\\""
+"\\337\\000\\000\\000\\000\\000\\000\\000"
+"\\001\\000\\000\\000\\000\\000\\000\\000"
+"\\\\0="

+ 41 - 0
fuzzing/fuzz-targets/fuzz_configfile.py

@@ -0,0 +1,41 @@
+import atheris
+import sys
+from io import BytesIO
+
+with atheris.instrument_imports():
+    from dulwich.config import ConfigFile
+
+
+def is_expected_error(error_list, error_msg):
+    for error in error_list:
+        if error in error_msg:
+            return True
+    return False
+
+
+def TestOneInput(data):
+    try:
+        ConfigFile.from_file(BytesIO(data))
+    except ValueError as e:
+        expected_errors = [
+            "without section",
+            "invalid variable name",
+            "expected trailing ]",
+            "invalid section name",
+            "Invalid subsection",
+            "escape character",
+            "missing end quote",
+        ]
+        if is_expected_error(expected_errors, str(e)):
+            return -1
+        else:
+            raise e
+
+
+def main():
+    atheris.Setup(sys.argv, TestOneInput)
+    atheris.Fuzz()
+
+
+if __name__ == "__main__":
+    main()

+ 37 - 0
fuzzing/oss-fuzz-scripts/build.sh

@@ -0,0 +1,37 @@
+# shellcheck shell=bash
+
+set -euo pipefail
+
+python3 -m pip install .
+
+# Directory to look in for dictionaries, options files, and seed corpora:
+SEED_DATA_DIR="$SRC/seed_data"
+
+find "$SEED_DATA_DIR" \( -name '*_seed_corpus.zip' -o -name '*.options' -o -name '*.dict' \) \
+  ! \( -name '__base.*' \) -exec printf 'Copying: %s\n' {} \; \
+  -exec chmod a-x {} \; \
+  -exec cp {} "$OUT" \;
+
+# Build fuzzers in $OUT.
+find "$SRC/dulwich/fuzzing" -name 'fuzz_*.py' -print0 | while IFS= read -r -d '' fuzz_harness; do
+  compile_python_fuzzer "$fuzz_harness"
+
+  common_base_dictionary_filename="$SEED_DATA_DIR/__base.dict"
+  if [[ -r "$common_base_dictionary_filename" ]]; then
+    # Strip the `.py` extension from the filename and replace it with `.dict`.
+    fuzz_harness_dictionary_filename="$(basename "$fuzz_harness" .py).dict"
+    output_file="$OUT/$fuzz_harness_dictionary_filename"
+
+    printf 'Appending %s to %s\n' "$common_base_dictionary_filename" "$output_file"
+    if [[ -s "$output_file" ]]; then
+      # If a dictionary file for this fuzzer already exists and is not empty,
+      # we append a new line to the end of it before appending any new entries.
+      #
+      # LibFuzzer will happily ignore multiple empty lines in a dictionary but fail with an error
+      # if any single line has incorrect syntax (e.g., if we accidentally add two entries to the same line.)
+      # See docs for valid syntax: https://llvm.org/docs/LibFuzzer.html#id32
+      echo >>"$output_file"
+    fi
+    cat "$common_base_dictionary_filename" >>"$output_file"
+  fi
+done

+ 55 - 0
fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh

@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+#################
+# Prerequisites #
+#################
+
+for cmd in python3 git wget rsync; do
+  command -v "$cmd" >/dev/null 2>&1 || {
+    printf '[%s] Required command %s not found, exiting.\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$cmd" >&2
+    exit 1
+  }
+done
+
+SEED_DATA_DIR="$SRC/seed_data"
+mkdir -p "$SEED_DATA_DIR"
+
+#############
+# Functions #
+#############
+
+download_and_concatenate_common_dictionaries() {
+  # Assign the first argument as the target file where all contents will be concatenated
+  target_file="$1"
+
+  # Shift the arguments so the first argument (target_file path) is removed
+  # and only URLs are left for the loop below.
+  shift
+
+  for url in "$@"; do
+    wget -qO- "$url" >>"$target_file"
+    # Ensure there's a newline between each file's content
+    echo >>"$target_file"
+  done
+}
+
+fetch_seed_data() {
+    rsync -avc "$SRC/dulwich/fuzzing/dictionaries/" "$SEED_DATA_DIR/"
+}
+
+########################
+# Main execution logic #
+########################
+
+fetch_seed_data
+
+download_and_concatenate_common_dictionaries "$SEED_DATA_DIR/__base.dict" \
+  "https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/utf8.dict" \
+  "https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/url.dict"
+
+# The OSS-Fuzz base image has outdated dependencies by default so we upgrade them below.
+python3 -m pip install --upgrade pip
+# Upgrade to the latest versions known to work at the time the below changes were introduced:
+python3 -m pip install 'setuptools~=69.0' 'pyinstaller~=6.0'