Pure SQL environment
Starting from Sedona v1.0.1, you can use Sedona in a pure Spark SQL environment. The example code is written in SQL.
SedonaSQL supports SQL/MM Part3 Spatial SQL Standard. Detailed SedonaSQL APIs are available here: SedonaSQL API
Initiate Session¶
Start spark-sql
as following (replace <VERSION>
with actual version like 1.5.1
):
Run spark-sql with Apache Sedona
spark-sql --packages org.apache.sedona:sedona-spark-shaded-3.0_2.12:<VERSION>,org.datasyslab:geotools-wrapper:<VERSION>-28.2 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator \
--conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
spark-sql --packages org.apache.sedona:sedona-spark-shaded-3.4_2.12:<VERSION>,org.datasyslab:geotools-wrapper:<VERSION>-28.2 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator \
--conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
3.4
in artifact names with the corresponding major.minor version of Spark.
This will register all Sedona types, functions and optimizations in SedonaSQL and SedonaViz.
Load data¶
Let use data from examples/sql
. To load data from CSV file we need to execute two commands:
Use the following code to load the data and create a raw DataFrame:
CREATE TABLE IF NOT EXISTS pointraw (_c0 string, _c1 string)
USING csv
OPTIONS(header='false')
LOCATION '<some path>/sedona/examples/sql/src/test/resources/testpoint.csv';
CREATE TABLE IF NOT EXISTS polygonraw (_c0 string, _c1 string, _c2 string, _c3 string)
USING csv
OPTIONS(header='false')
LOCATION '<some path>/sedona/examples/sql/src/test/resources/testenvelope.csv';
Transform the data¶
We need to transform our point and polygon data into respective types:
CREATE OR REPLACE TEMP VIEW pointdata AS
SELECT ST_Point(cast(pointraw._c0 as Decimal(24,20)), cast(pointraw._c1 as Decimal(24,20))) AS pointshape
FROM pointraw;
CREATE OR REPLACE TEMP VIEW polygondata AS
select ST_PolygonFromEnvelope(cast(polygonraw._c0 as Decimal(24,20)),
cast(polygonraw._c1 as Decimal(24,20)), cast(polygonraw._c2 as Decimal(24,20)),
cast(polygonraw._c3 as Decimal(24,20))) AS polygonshape
FROM polygonraw;
Work with data¶
For example, let join polygon and test data:
SELECT * from polygondata, pointdata
WHERE ST_Contains(polygondata.polygonshape, pointdata.pointshape)
AND ST_Contains(ST_PolygonFromEnvelope(1.0,101.0,501.0,601.0), polygondata.polygonshape)
LIMIT 5;