Install Sedona Python
Click and play the interactive Sedona Python Jupyter Notebook immediately!
Apache Sedona extends pyspark functions which depends on libraries:
- pyspark
- shapely
- attrs
You need to install necessary packages if your system does not have them installed. See "packages" in our Pipfile.
Install sedona¶
- Installing from PyPI repositories. You can find the latest Sedona Python on PyPI. There is a known issue in Sedona v1.0.1 and earlier versions.
pip install apache-sedona
- Since Sedona v1.1.0, pyspark is an optional dependency of Sedona Python because spark comes pre-installed on many spark platforms. To install pyspark along with Sedona Python in one go, use the
spark
extra:
pip install apache-sedona[spark]
- Installing from Sedona Python source
Clone Sedona GitHub source code and run the following command
cd python
python3 setup.py install
Prepare sedona-spark jar¶
Sedona Python needs one additional jar file called sedona-spark-shaded
or sedona-spark
to work properly. Please make sure you use the correct version for Spark and Scala.
Please use Spark major.minor version number in artifact names.
You can get it using one of the following methods:
- If you run Sedona in Databricks, AWS EMR, or other cloud platform's notebook, use the
shaded jar
: Download sedona-spark-shaded jar and geotools-wrapper jar from Maven Central, and put them in SPARK_HOME/jars/ folder. - If you run Sedona in an IDE or a local Jupyter notebook, use the
unshaded jar
. Call the Maven Central coordinate in your python program. For example, Sedona >= 1.4.1
from sedona.spark import *
config = SedonaContext.builder(). \
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-3.3_2.12:1.6.1,'
'org.datasyslab:geotools-wrapper:1.6.1-28.2'). \
config('spark.jars.repositories', 'https://artifacts.unidata.ucar.edu/repository/unidata-all'). \
getOrCreate()
sedona = SedonaContext.create(config)
Sedona < 1.4.1
SedonaRegistrator is deprecated in Sedona 1.4.1 and later versions. Please use the above method instead.
from pyspark.sql import SparkSession
from sedona.register import SedonaRegistrator
from sedona.utils import SedonaKryoRegistrator, KryoSerializer
spark = SparkSession. \
builder. \
appName('appName'). \
config("spark.serializer", KryoSerializer.getName). \
config("spark.kryo.registrator", SedonaKryoRegistrator.getName). \
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.3_2.12:1.6.1,'
'org.datasyslab:geotools-wrapper:1.6.1-28.2'). \
getOrCreate()
SedonaRegistrator.registerAll(spark)
Setup environment variables¶
If you manually copy the sedona-spark-shaded jar to SPARK_HOME/jars/
folder, you need to setup two environment variables
- SPARK_HOME. For example, run the command in your terminal
export SPARK_HOME=~/Downloads/spark-3.0.1-bin-hadoop2.7
- PYTHONPATH. For example, run the command in your terminal
export PYTHONPATH=$SPARK_HOME/python
You can then play with Sedona Python Jupyter notebook.