2025 was a milestone year for Apache Sedona. We made major progress in distributed spatial analytics on Spark, Flink, and Snowflake, launched a new single-node engine called SedonaDB, and pushed forward benchmarking and open geospatial data standards.
This post summarizes the most important highlights from the Apache Sedona ecosystem in 2025.
Recently Databricks announced that Serverless SQL users "will see up to 17x faster performance compared to classic clusters with Apache Sedona installed." Unfortunately, Databricks didn’t speak to the cost of the results. It is also an apples-to-oranges comparison because the serverless compute shape and quantity that Databricks deployed to generate the 17x performance difference was not shared. Their result was also limited to specific query configurations.
We saw an opportunity to address these issues using SpatialBench, a new benchmarking framework for spatial queries. Because we are comparing different infrastructure types, our benchmark normalizes on price-performance rather than performance alone, while providing what we believe is a more comprehensive benchmarking result.
We found that only one of the simpler SpatialBench queries (#2) tested on Databricks SQL Serverless had price-performance aligned with the Databricks claim. However, we found that Sedona excelled in most other queries, delivering up to 6x better price-performance, while offering more query coverage.
SpatialBench is a benchmarking framework for spatial joins, distance queries, and point-in-polygon analyses.
Traditional benchmarking frameworks don’t include spatial workflows. It’s important to benchmark spatial workflows separately because an engine that’s fast for tabular data analyses isn’t necessarily performant for spatial queries.
For example, here are the SpatialBench results for Scale Factor 1 (SF-1) and SF-10 for SedonaDB, DuckDB, and GeoPandas on a single ec2 instance:
The Apache Sedona community is excited to announce the release of SedonaDB version 0.2.0!
SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen. It is developed as a subproject of Apache Sedona. This release consists of 136 resolved issues including 40 new functions from 17 contributors.
Apache Sedona powers large-scale geospatial processing on distributed engines like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). SedonaDB extends the Sedona ecosystem with a single-node engine optimized for small-to-medium data analytics, delivering the simplicity and speed that distributed systems often cannot.
The Apache Sedona community is excited to announce the initial release of SedonaDB! 🎉
SedonaDB is the first open-source, single-node analytical database engine that treats spatial data as a first-class citizen. It is developed as a subproject of Apache Sedona.
Apache Sedona powers large-scale geospatial processing on distributed engines like Spark (SedonaSpark), Flink (SedonaFlink), and Snowflake (SedonaSnow). SedonaDB extends the Sedona ecosystem with a single-node engine optimized for small-to-medium data analytics, delivering the simplicity and speed that distributed systems often cannot.
TL;DR The H3 spatial index provides a number of spatial functions and a consistent grid system for efficient data aggregation and visualization. H3 is an approximation that makes some computations run faster, but less accurately. Sedona supports H3 spatial index, but it's often preferable to use precise computations, especially when accuracy is important.
For several years, Apache Sedona has been the go-to open-source engine for processing massive geospatial
datasets, extending Apache Spark to handle complex spatial operations with unparalleled speed and efficiency.
Sedona's capabilities also extend beyond Spark, bringing spatial analytics to the Snowflake data warehouse
with SedonaSnow and the
real-time streaming engine Apache Flink with a Spatial SQL integration.