Skip to content

Pure SQL

Starting from Sedona v1.0.1, you can use Sedona in a pure Spark SQL environment. The example code is written in SQL.

SedonaSQL supports SQL/MM Part3 Spatial SQL Standard. Detailed SedonaSQL APIs are available here: SedonaSQL API

Initiate Session

Start spark-sql as following (replace <VERSION> with actual version, like, 1.0.1-incubating):

spark-sql --packages org.apache.sedona:sedona-python-adapter-3.0_2.12:<VERSION>,org.apache.sedona:sedona-viz-3.0_2.12:<VERSION>,org.datasyslab:geotools-wrapper:geotools-24.0 \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator \
  --conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions

This will register all User Defined Tyeps, functions and optimizations in SedonaSQL and SedonaViz.

Load data

Let use data from examples/sql. To load data from CSV file we need to execute two commands:

Use the following code to load the data and create a raw DataFrame:

CREATE TABLE IF NOT EXISTS pointraw (_c0 string, _c1 string) 
USING csv 
OPTIONS(header='false') 
LOCATION '<some path>/incubator-sedona/examples/sql/src/test/resources/testpoint.csv';

CREATE TABLE IF NOT EXISTS polygonraw (_c0 string, _c1 string, _c2 string, _c3 string) 
USING csv 
OPTIONS(header='false') 
LOCATION '<some path>/incubator-sedona/examples/sql/src/test/resources/testenvelope.csv';

Transform the data

We need to transform our point and polygon data into respective types:

CREATE OR REPLACE TEMP VIEW pointdata AS
  SELECT ST_Point(cast(pointraw._c0 as Decimal(24,20)), cast(pointraw._c1 as Decimal(24,20))) AS pointshape
  FROM pointraw;

CREATE OR REPLACE TEMP VIEW polygondata AS
  select ST_PolygonFromEnvelope(cast(polygonraw._c0 as Decimal(24,20)),
        cast(polygonraw._c1 as Decimal(24,20)), cast(polygonraw._c2 as Decimal(24,20)), 
        cast(polygonraw._c3 as Decimal(24,20))) AS polygonshape 
  FROM polygonraw;

Work with data

For example, let join polygon and test data:

SELECT * from polygondata, pointdata 
WHERE ST_Contains(polygondata.polygonshape, pointdata.pointshape) 
      AND ST_Contains(ST_PolygonFromEnvelope(1.0,101.0,501.0,601.0), polygondata.polygonshape)
LIMIT 5;

Last update: May 28, 2021 01:55:27