🎉 Apache Sedona 1.8.1 is now available! Check out the new features and improvements.

The official source for Apache Sedona news, technical insights, release updates, and best practices in large-scale spatial data management.

Spatial Query Benchmarking on Databricks with SpatialBench

Recently Databricks announced that Serverless SQL users "will see up to 17x faster performance compared to classic clusters with Apache Sedona installed." Unfortunately, Databricks didn’t speak to the cost of the results. It is also an apples-to-oranges comparison because the serverless compute shape and quantity that Databricks deployed to generate the 17x performance difference was not shared. Their result was also limited to specific query configurations.

We saw an opportunity to address these issues using SpatialBench, a new benchmarking framework for spatial queries. Because we are comparing different infrastructure types, our benchmark normalizes on price-performance rather than performance alone, while providing what we believe is a more comprehensive benchmarking result.

We found that only one of the simpler SpatialBench queries (#2) tested on Databricks SQL Serverless had price-performance aligned with the Databricks claim. However, we found that Sedona excelled in most other queries, delivering up to 6x better price-performance, while offering more query coverage.

Introducing SpatialBench: performance benchmarks for spatial database queries

SpatialBench is a benchmarking framework for spatial joins, distance queries, and point-in-polygon analyses.

Traditional benchmarking frameworks don’t include spatial workflows. It’s important to benchmark spatial workflows separately because an engine that’s fast for tabular data analyses isn’t necessarily performant for spatial queries.

For example, here are the SpatialBench results for Scale Factor 1 (SF-1) and SF-10 for SedonaDB, DuckDB, and GeoPandas on a single ec2 instance:

SedonaDB 0.2.0 Release

The Apache Sedona community is excited to announce the release of SedonaDB version 0.2.0!

SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen. It is developed as a subproject of Apache Sedona. This release consists of 136 resolved issues including 40 new functions from 17 contributors.

Apache Sedona powers large-scale geospatial processing on distributed engines like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). SedonaDB extends the Sedona ecosystem with a single-node engine optimized for small-to-medium data analytics, delivering the simplicity and speed that distributed systems often cannot.

Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen

The Apache Sedona community is excited to announce the initial release of SedonaDB! 🎉

SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen. It is developed as a subproject of Apache Sedona.

Apache Sedona powers large-scale geospatial processing on distributed engines like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). SedonaDB extends the Sedona ecosystem with a single-node engine optimized for small-to-medium data analytics, delivering the simplicity and speed that distributed systems often cannot.

Welcome to the Apache Sedona Blog!

Welcome to the brand-new blog for Apache Sedona!

For several years, Apache Sedona has been the go-to open-source engine for processing massive geospatial datasets, extending Apache Spark to handle complex spatial operations with unparalleled speed and efficiency. Sedona's capabilities also extend beyond Spark, bringing spatial analytics to the Snowflake data warehouse with SedonaSnow and the real-time streaming engine Apache Flink with a Spatial SQL integration.