Skip to content

Install Sedona Python

Click Binder and play the interactive Sedona Python Jupyter Notebook immediately!

Apache Sedona extends pyspark functions which depends on libraries:

  • pyspark
  • shapely
  • attrs

You need to install necessary packages if your system does not have them installed. See "packages" in our Pipfile.

Install sedona

pip install apache-sedona
  • Since Sedona v1.1.0, pyspark is an optional dependency of Sedona Python because spark comes pre-installed on many spark platforms. To install pyspark along with Sedona Python in one go, use the spark extra:
pip install apache-sedona[spark]
  • Installing from Sedona Python source

Clone Sedona GitHub source code and run the following command

cd python
python3 setup.py install

Prepare sedona-spark-shaded jar

Sedona Python needs one additional jar file called sedona-spark-shaded to work properly. Please make sure you use the correct version for Spark and Scala.

  • For Spark 3.0 to 3.3 and Scala 2.12, it is called sedona-spark-shaded-3.0_2.12-1.4.1.jar
  • For Spark 3.4+ and Scala 2.12, it is called sedona-spark-shaded-3.4_2.12-1.4.1.jar. If you are using Spark versions higher than 3.4, please replace the 3.4 in artifact names with the corresponding major.minor version numbers.

You can get it using one of the following methods:

  1. Compile from the source within main project directory and copy it (in spark-shaded/target folder) to SPARK_HOME/jars/ folder (more details)

  2. Download from GitHub release and copy it to SPARK_HOME/jars/ folder

  3. Call the Maven Central coordinate in your python program. For example, Sedona >= 1.4.1
from sedona.spark import *
config = SedonaContext.builder(). \
    config('spark.jars.packages',
           'org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.1,'
           'org.datasyslab:geotools-wrapper:1.4.0-28.2'). \
    getOrCreate()
sedona = SedonaContext.create(config)

Sedona < 1.4.1

SedonaRegistrator is deprecated in Sedona 1.4.1 and later versions. Please use the above method instead.

from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = SparkSession. \
    builder. \
    appName('appName'). \
    config("spark.serializer", KryoSerializer.getName). \
    config("spark.kryo.registrator", SedonaKryoRegistrator.getName). \
    config('spark.jars.packages',
           'org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.1,'
           'org.datasyslab:geotools-wrapper:1.4.0-28.2'). \
    getOrCreate()
SedonaRegistrator.registerAll(spark)

Warning

If you are going to use Sedona CRS transformation and ShapefileReader functions, you have to use Method 1 or 3. Because these functions internally use GeoTools libraries which are under LGPL license, Apache Sedona binary release cannot include them.

Setup environment variables

If you manually copy the sedona-spark-shaded jar to SPARK_HOME/jars/ folder, you need to setup two environment variables

  • SPARK_HOME. For example, run the command in your terminal
export SPARK_HOME=~/Downloads/spark-3.0.1-bin-hadoop2.7
  • PYTHONPATH. For example, run the command in your terminal
export PYTHONPATH=$SPARK_HOME/python

You can then play with Sedona Python Jupyter notebook.


Last update: June 6, 2023 07:38:32