Contributors Guide¶
This guide details how to set up your development environment as a SedonaDB Contributor.
Fork and clone the repository¶
Your first step is to create a personal copy of the repository and connect it to the main project.
-
Fork the repository
- Navigate to the official SedonaDB GitHub repository.
- Click the Fork button in the top-right corner. This creates a complete copy of the project in your own GitHub account.
-
Clone your fork
- Next, clone your newly created fork to your local machine. This command downloads the repository into a new folder named
sedona-db. -
Replace
YourUsernamewith your actual GitHub username.git clone https://github.com/YourUsername/sedona-db.git cd sedona-db
- Next, clone your newly created fork to your local machine. This command downloads the repository into a new folder named
-
Configure the remotes
- Your local repository needs to know where the original project is so you can pull in updates. You'll add a remote link, traditionally named
upstream, to the main SedonaDB repository. -
Your fork is automatically configured as the
originremote.# Add the main repository as the "upstream" remote git remote add upstream https://github.com/apache/sedona-db.git
- Your local repository needs to know where the original project is so you can pull in updates. You'll add a remote link, traditionally named
-
Verify the configuration
-
Run the following command to verify that you have two remotes configured correctly:
origin(your fork) andupstream(the main repository).git remote -v -
The output should look like this:
origin https://github.com/YourUsername/sedona-db.git (fetch) origin https://github.com/YourUsername/sedona-db.git (push) upstream https://github.com/apache/sedona-db.git (fetch) upstream https://github.com/apache/sedona-db.git (push)
-
System dependencies¶
Some crates in the workspace wrap native libraries and require system dependencies (GEOS, GDAL, PROJ, Abseil, OpenSSL, CMake, etc.). We recommend using:
macOS: Homebrew¶
brew install abseil openssl cmake geos gdal proj
Ensure Homebrew-installed tools are on your PATH (Homebrew usually does this automatically).
Windows¶
Suggested workflow (PowerShell):
First, install Rust if it is not already installed:
Invoke-WebRequest https://sh.rustup.rs -UseBasicParsing -OutFile rustup-init.exe
.\rustup-init.exe
# Restart PowerShell
rustc --version
cargo --version
Next, install Visual Studio Build Tools (https://visualstudio.microsoft.com/downloads/). Pick "Desktop development with C++" during install.
Next, install CMake (https://cmake.org/). Ensure "Add CMake to system PATH" is selected during installation.
cmake --version
Now, install and bootstrap vcpkg (example path: C:\dev\vcpkg — you can choose a different path; see note below about short paths):
git clone https://github.com/microsoft/vcpkg.git C:\dev\vcpkg
cd C:\dev\vcpkg
.\bootstrap-vcpkg.bat
Next, install the required libraries with vcpkg:
C:\dev\vcpkg\vcpkg.exe install geos gdal proj abseil openssl
Configure environment variables (PowerShell example — update paths as needed):
$env:VCPKG_ROOT = 'C:\dev\vcpkg'
$env:CMAKE_TOOLCHAIN_FILE = "${env:VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
# Add pkg-config/ msys path (hash may vary) for using pkg-config command
$env:PATH = "${env:VCPKG_ROOT}/downloads/tools/msys2/<msys-hash>/mingw64/bin/;${env:PATH}"
# Add path to DLLs (without this, the build still succeeds, but loading fails)
$env:PATH = "${env:VCPKG_ROOT}/installed/x64-windows/bin/;${env:PATH}"
# Add other pkg-config related settings
$env:PKG_CONFIG_SYSROOT_DIR = "${env:VCPKG_ROOT}/downloads/tools/msys2/<msys-hash>/mingw64/"
$env:PKG_CONFIG_PATH = "${env:VCPKG_ROOT}/installed/x64-windows/lib/pkgconfig/"
Note: the downloads/tools/msys2/
VS Code integration (so rust-analyzer sees the toolchain):
Add to your settings.json:
{
"rust-analyzer.runnables.extraEnv": {
"CMAKE_TOOLCHAIN_FILE": "C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake"
},
"rust-analyzer.cargo.extraEnv": {
"CMAKE_TOOLCHAIN_FILE": "C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake"
}
}
Linux¶
Linux users may install system dependencies from a system package manager. Note that recent versions are required because the Abseil version required is relatively recent compared to the package version on some common LTS platforms.
Ubuntu/Debian (Ubuntu 24.04 LTS is too old; however, later versions have the required version of Abseil)
sudo apt-get install -y build-essential cmake libssl-dev libproj-dev libgeos-dev libgdal-dev python3-dev libabsl-dev
Rust¶
SedonaDB is written in Rust and is a standard cargo workspace.
Before running cargo test, make sure to set the CMake toolchain variable:
export CMAKE_TOOLCHAIN_FILE=/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake
Replace /path/to/vcpkg/ with the actual path to your vcpkg installation.
Once set, you can run: cargo test
This ensures that Cargo and proj-sys can find the correct C/C++ dependencies via CMake.
You can install a recent version of the Rust compiler and cargo from
rustup.rs and run tests using cargo test.
A local development version of the CLI can be run with cargo run --bin sedona-cli.
Test data setup¶
Some tests require submodules that contain test data or pinned versions of external dependencies. These submodules can be initialized with:
git submodule init
git submodule update --recursive
Additionally, some of the data required in the tests can be downloaded by running the following script.
python submodules/download-assets.py
Python¶
Python bindings to SedonaDB are built with the Maturin build backend.
To install a development version of the main Python bindings for the first time, run the following commands:
cd python/sedonadb
pip install -e ".[test]"
If editing Rust code in either SedonaDB or the Python bindings, you can recompile the native component with:
maturin develop
If you don't yet have maturin installed, you can install it using pip
pip install maturin
Debugging¶
Rust¶
Debugging Rust code is most easily done by writing or finding a test that triggers the desired behavior and running it using the Debug selection in VSCode with the rust-analyzer extension. Rust code can also be debugged using the CLI by finding the main() function in
sedona-cli and choosing the Debug run option.
Python, C, and C++¶
Installation of Python bindings with maturin develop ensures a debug-friendly build for debugging Rust, Python, or C/C++ code. Python code can be debugged using breakpoints in any IDE that supports debugging an editable Python package installation (e.g., VSCode); Rust, C, or C++ code can be debugged using the CodeLLDB Attach to Process... command from the command palette in VSCode.
Testing¶
Running Rust tests¶
We use cargo to run the Rust tests.
cargo test
Running Python tests¶
A large number of the Python tests rely on a running PostGIS instance. You can spin one up by using the providied PostGIS docker compose file.
docker compose up -d
You can later shut it down with
docker compose down
You can open a shell into the running PostgreSQL server with
docker compose exec postgis psql -U postgres
...which is useful for interactively checking expected behaviour when adding new functions to SedonaDB.
To run the actual Python tests, you can use pytest.
e.g Run all of the tests
pytest python/sedonadb/tests
Remember that you need to run maturin develop to update your python installation after changes in Rust code.
Linting¶
Install pre-commit. This will automatically run various checks (e.g formatting) that will be needed to pass CI.
pre-commit install
If pre-commit is not already installed, you can install it using pip.
pip install pre-commit
Additionally, you should run clippy to catch common lints before pushing new Rust changes. This is not included in pre-commit, so this should be run manually. Fix any suggestions it makes, and run it again to make sure there are no other changes to make.
cargo clippy
Low-level benchmarking¶
Low-level Rust benchmarks use criterion. In general, there is at least one benchmark for every implementation of a function (some functions have more than one implementation provided by different libraries), and a few other benchmarks for low-level iteration where work was done to optimize specific cases.
Running benchmarks¶
Benchmarks for a specific crate can be run with cargo bench:
cd rust/sedona-geo
cargo bench
Benchmarks for a specific function can be run with a filter. These can be run from the workspace or a specific crate (although the output is usually easier to read for a specific crate).
cargo bench -- st_area
Managing results¶
By default, criterion saves the last run and will report the difference between the current benchmark and the last time it was run (although there are options to save and load various baselines).
A report of the latest results for all benchmarks can be opened with the following command:
open target/criterion/report/index.html
xdg-open target/criterion/report/index.html
All previous saved benchmark runs can be cleared with:
rm -rf target/criterion
Implementation conventions¶
Scalar SQL functions¶
For scalar SQL functions with arguments that can be either scalar or array
values, use the executor helpers in
rust/sedona-functions/src/executor.rs instead of manually branching on scalar
and array arguments. For non-geometry arguments, cast the argument once, convert
it to an array with executor.num_iterations(), iterate it in lockstep with the
geometry executor, and handle nulls in the row loop:
let arg1 = args[1]
.cast_to(&DataType::Float64, None)?
.to_array(executor.num_iterations())?;
let arg1_array = as_float64_array(&arg1)?;
let mut arg1_iter = arg1_array.iter();
executor.execute_wkb_void(|maybe_wkb| {
match (maybe_wkb, arg1_iter.next().unwrap()) {
(Some(wkb), Some(arg1)) => {
invoke_scalar(&wkb, arg1, &mut builder)?;
builder.append_value([]);
}
_ => builder.append_null(),
}
Ok(())
})?;
Avoid unwrap() and expect() for user or data failures in runtime paths. Use
DataFusion errors for those cases, and reserve panics for local invariants like
iterator lengths that were already fixed with to_array(executor.num_iterations()).
When a scalar function accepts both scalar and array arguments, add
ScalarUdfTester coverage for both paths and for null handling.
Documentation¶
To contribute to the SedonaDB documentation:
- Clone the repository and create a fork.
- Install the Documentation dependencies:
pip install -r docs/requirements.txt - Make your changes to the documentation files.
- Preview your changes locally using these commands:
mkdocs serve- Start the live-reloading docs server.mkdocs build- Build the documentation site.mkdocs -h- Print help message and exit.
- Push your changes and open a pull request.
SQL function reference is special: because we provide so many functions, we have
a specialized syntax for documenting them. The minimum required documentation for
a function is a file docs/reference/sql/function_name.qmd:
---
title: ST_FunctionName
description: A brief one sentence description of what the function does.
kernels:
- returns: geometry
args: [geometry]
---
## Examples
```sql
SELECT ST_FunctionName(ST_Point(0, 1)) AS val;
```
After writing this file, the .md file may be rendered using Quarto:
cd docs/reference/sql
quarto render
This command (1) expands description and kernels to a templated representation,
(2) checks and renders the result of the SQL examples, and (3) executes any
Python code chunks. These may
be used to render figures that demonstrate visually what a function does or how its
parameters affect the result.
The kernels section of the frontmatter allows multiple implementations of a function
to be documented. For example, many functions include implementations for geometry
and geography or allow extra arguments to be supplied to customize behaviour. As
an example, the frontmatter for ST_Buffer() is:
---
title: ST_Buffer
description: >
Computes a geometry that represents all points whose distance from the input
geometry is less than or equal to a specified distance.
kernels:
- returns: geometry
args:
- geometry
- name: distance
type: float64
description: Radius of the buffer
- returns: geometry
args:
- geometry
- name: distance
type: float64
- name: params
type: utf8
description: Space-separated `key=value` parameters.
---
This illustrates a few ways in which arguments can be defined:
- By the string
geometry,geography, orraster. These are expanded to a full definition by quarto but are so common that we allow abbreviating them to avoid typingdescription: Input geometryfor every single function. - With a YAML object of
name/type/description. The type names are lowercase Arrow type names which should be identical to those printed when executing a query in SedonaDB.
The build system for function documentation is a work in progress, so be sure to ask if you run into problems or have any questions about the syntax!
Publishing the GPU Docker image¶
The GPU image built from docker/sedonadb-gpu.dockerfile is published to Docker Hub as
apache/sedona under the sedonadb-latest tag.
Publishing requires push access to the apache organization, so run docker login with an
authorized account first. The image is multi-architecture (linux/amd64 and linux/arm64), so
you also need a Buildx builder that can target multiple platforms — create one once per machine
with docker buildx create --use.
To build and push the image, run the helper script from the repository root:
docker/build.sh release apache/sedona:sedonadb-latest
The release mode builds both CPU architectures and pushes the result straight to the registry. A
final argument overrides the CUDA target (the default is 75;86;89, a fat binary covering Turing,
Ampere, and Ada Lovelace GPUs such as the T4, A10G, and L4). Because each CPU architecture compiles
CUDA, Rust, and vcpkg dependencies for every CUDA target, this is best run on native amd64 and
arm64 hardware (such as CI runners): building the non-native architecture locally falls back to
QEMU emulation and is very slow.