Skip to content

Quickstart

Installation

Install from source:

git clone https://github.com/apache/sedona-spatialbench.git
cd sedona-spatialbench
cargo install --path spatialbench-cli

After installation, you should be able to run:

spatialbench-cli --help

Generate SF1 Data

To generate the full dataset at scale factor 1 in Parquet format:

spatialbench-cli --scale-factor 1

This creates six tables: * trip * customer * driver * vehicle * zone * building

Output is written to the current directory by default.

Customizing Output Files

We'll go over a few common options to customize the output files. To see all available options, run spatialbench-cli --help.

Generate a Subset of Tables

spatialbench-cli --scale-factor 1 --tables trip,building

Partition Table Output into Multiple Files

Specify the number of partitions manually:

spatialbench-cli --scale-factor 10 --tables trip --parts 4

Or let the CLI determine the number of files using target size:

spatialbench-cli --scale-factor 10 --mb-per-file 512

Set Output Directory

spatialbench-cli --scale-factor 1 --output-dir data/sf1

Configuring Spatial Distributions

SpatialBench uses a spatial data generator to generate synthetic points and polygons using realistic spatial distributions.

To read more about the different spatial distributions offered by SpatialBench see here. For more details about tuning the spatial distributions and the full YAML schema and examples, see CONFIGURATION.md.

You can override these defaults at runtime by passing a YAML file via the --config flag:

spatialbench-cli --scale-factor 1 --config spatialbench-config.yml

If --config is not provided, SpatialBench checks for ./spatialbench-config.yml. If absent, it falls back to built-in defaults.