Skip to content

Apache Sedona (incubating) is a cluster computing system for processing large-scale spatial data. Sedona extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets / SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

Set up in 5 minutes with Maven and SBT. No installation required.

Python API is also available on PyPi.

Get started Go to GitHub



High Speed

According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads.
Execution time of spatial join with polygons

Low Memory Consumption

According to our benchmark and third-party research papers, Sedona has 50% less peak memory consumption than other Spark-based geospatial data systems for large-scale in-memory query processing.
Peak memory consumption of spatial join with polygons

Ease of Use

Sedona offers Scala, Java, Python and Spatial SQL APIs and integrates them into Apache Spark with care. You can simply create spatial analytics and data mining applications and run them in any Spark environments.
SELECT superhero.name
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.geom)
AND city.name = 'Gotham'