Skip to content

Install on AWS EMR

We recommend Sedona-1.3.1-incuabting and above for EMR. In the tutorial, we use AWS Elastic MapReduce (EMR) 6.9.0. It has the following applications installed: Hadoop 3.3.3, JupyterEnterpriseGateway 2.6.0, Livy 0.7.1, Spark 3.3.0.

Tip

Wherobots Cloud provides a free tool to deploy Apache Sedona to AWS EMR. Please sign up here.

This tutorial is tested on EMR on EC2 with EMR Studio (notebooks). EMR on EC2 uses YARN to manage resources.

Note

If you are using Spark 3.4+ and Scala 2.12, please use sedona-spark-shaded-3.4_2.12. Please pay attention to the Spark version postfix and Scala version postfix.

Prepare initialization script

In your S3 bucket, add a script that has the following content:

#!/bin/bash

# EMR clusters only have ephemeral local storage. It does not really matter where we store the jars.
sudo mkdir /jars

# Download Sedona jar
sudo curl -o /jars/sedona-spark-shaded-3.0_2.12-1.5.1.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/1.5.1/sedona-spark-shaded-3.0_2.12-1.5.1.jar"

# Download GeoTools jar
sudo curl -o /jars/geotools-wrapper-1.5.1-28.2.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar"

# Install necessary python libraries
sudo python3 -m pip install pandas==1.3.5
sudo python3 -m pip install shapely==1.8.5
sudo python3 -m pip install geopandas==0.11.1
sudo python3 -m pip install keplergl==0.3.2
sudo python3 -m pip install pydeck==0.8.0
sudo python3 -m pip install attrs matplotlib descartes apache-sedona==1.5.1

When you create a EMR cluster, in the bootstrap action, specify the location of this script.

Add software configuration

When you create a EMR cluster, in the software configuration, add the following content:

[
  {
    "Classification":"spark-defaults",
    "Properties":{
      "spark.yarn.dist.jars": "/jars/sedona-spark-shaded-3.0_2.12-1.5.1.jar,/jars/geotools-wrapper-1.5.1-28.2.jar",
      "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
      "spark.kryo.registrator": "org.apache.sedona.core.serde.SedonaKryoRegistrator",
      "spark.sql.extensions": "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
      }
  }
]

Note

If you use Sedona 1.3.1-incubating, please use sedona-python-adpater-3.0_2.12 jar in the content above, instead of sedona-spark-shaded-3.0_2.12.


Last update: January 1, 2024 09:17:53