Quickstart
Installation¶
Install from source:
git clone https://github.com/apache/sedona-spatialbench.git
cd sedona-spatialbench
cargo install --path spatialbench-cli
After installation, you should be able to run:
spatialbench-cli --help
Generate SF1 Data¶
To generate the full dataset at scale factor 1 in Parquet format:
spatialbench-cli --scale-factor 1
This creates six tables: * trip * customer * driver * vehicle * zone * building
Output is written to the current directory by default.
Customizing Output Files¶
We'll go over a few common options to customize the output files. To see all available options, run spatialbench-cli --help.
Generate a Subset of Tables¶
spatialbench-cli --scale-factor 1 --tables trip,building
Partition Table Output into Multiple Files¶
Specify the number of partitions manually:
spatialbench-cli --scale-factor 10 --tables trip --parts 4
Or let the CLI determine the number of files using target size:
spatialbench-cli --scale-factor 10 --mb-per-file 512
Set Output Directory¶
spatialbench-cli --scale-factor 1 --output-dir data/sf1
Configuring Spatial Distributions¶
SpatialBench uses a spatial data generator to generate synthetic points and polygons using realistic spatial distributions.
To read more about the different spatial distributions offered by SpatialBench see here. For more details about tuning the spatial distributions and the full YAML schema and examples, see CONFIGURATION.md.
You can override these defaults at runtime by passing a YAML file via the --config flag:
spatialbench-cli --scale-factor 1 --config spatialbench-config.yml
If --config is not provided, SpatialBench checks for ./spatialbench-config.yml. If absent, it falls back to built-in defaults.