Scala/Java
The page outlines the steps to visualize spatial data using SedonaViz. The example code is written in Scala but also works for Java.
SedonaViz provides native support for general cartographic design by extending Sedona to process large-scale spatial data. It can visualize Spatial RDD and Spatial Queries and render super high resolution image in parallel.
SedonaViz offers Map Visualization SQL. This gives users a more flexible way to design beautiful map visualization effects including scatter plots and heat maps. SedonaViz RDD API is also available.
Note
All SedonaViz SQL/DataFrame APIs are explained in SedonaViz API. Please see Viz example project
Why scalable map visualization?¶
Data visualization allows users to summarize, analyze and reason about data. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic visualization solutions such as Google Maps, MapBox and ArcGIS suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. In big spatial data scenarios, these tools just crash or run forever.
SedonaViz encapsulates the main steps of map visualization process, e.g., pixelize, aggregate, and render, into a set of massively parallelized GeoViz operators and the user can assemble any customized styles.
Visualize SpatialRDD¶
This tutorial mainly focuses on explaining SQL/DataFrame API. SedonaViz RDD example can be found in Please see Viz example project
Set up dependencies¶
- Read Sedona Maven Central coordinates
- Add Apache Spark core, Apache SparkSQL, Sedona-core, Sedona-SQL, Sedona-Viz
Create Sedona config¶
Use the following code to create your Sedona config at the beginning. If you already have a SparkSession (usually named spark
) created by Wherobots/AWS EMR/Databricks, please skip this step and can use spark
directly.
Sedona >= 1.4.1=
val config = SedonaContext.builder()
.config("spark.kryo.registrator", classOf[SedonaVizKryoRegistrator].getName) // org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator
.master("local[*]") // Delete this if run in cluster mode
.appName("Sedona Viz") // Change this to a proper name
.getOrCreate()
Sedona <1.4.1
The following method has been deprecated since Sedona 1.4.1. Please use the method above to create your Sedona config.
var sparkSession = SparkSession.builder()
.master("local[*]") // Delete this if run in cluster mode
.appName("Sedona Viz") // Change this to a proper name
// Enable Sedona custom Kryo serializer
.config("spark.serializer", classOf[KryoSerializer].getName) // org.apache.spark.serializer.KryoSerializer
.config("spark.kryo.registrator", classOf[SedonaVizKryoRegistrator].getName) // org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator
.getOrCreate()
Initiate SedonaContext¶
Add the following line after creating Sedona config. If you already have a SparkSession (usually named spark
) created by Wherobots/AWS EMR/Databricks, please call SedonaContext.create(spark)
instead.
Sedona >= 1.4.1=
val sedona = SedonaContext.create(config)
SedonaVizRegistrator.registerAll(sedona)
Sedona <1.4.1
The following method has been deprecated since Sedona 1.4.1. Please use the method above to create your SedonaContext.
SedonaSQLRegistrator.registerAll(sparkSession)
SedonaVizRegistrator.registerAll(sparkSession)
You can also register everything by passing --conf spark.sql.extensions=org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
to spark-submit
or spark-shell
.
Create Spatial DataFrame¶
There is a DataFrame as follows:
+----------+---------+
| _c0| _c1|
+----------+---------+
|-88.331492|32.324142|
|-88.175933|32.360763|
|-88.388954|32.357073|
|-88.221102| 32.35078|
You first need to create a Geometry type column.
CREATE OR REPLACE TEMP VIEW pointtable AS
SELECT ST_Point(cast(pointtable._c0 as Decimal(24,20)),cast(pointtable._c1 as Decimal(24,20))) as shape
FROM pointtable
As you know, Sedona provides many different methods to load various spatial data formats. Please read Write a Spatial DataFrame application.
Generate a single image¶
In most cases, you just want to see a single image out of your spatial dataset.
Pixelize spatial objects¶
To put spatial objects on a map image, you first need to convert them to pixels.
First, compute the spatial boundary of this column.
CREATE OR REPLACE TEMP VIEW boundtable AS
SELECT ST_Envelope_Aggr(shape) as bound FROM pointtable
Then use ST_Pixelize to convert them to pixels.
This example is for Sedona before v1.0.1. ST_Pixelize extends Generator, so it can directly flatten the array without the explode function.
CREATE OR REPLACE TEMP VIEW pixels AS
SELECT pixel, shape FROM pointtable
LATERAL VIEW ST_Pixelize(ST_Transform(shape, 'epsg:4326','epsg:3857'), 256, 256, (SELECT ST_Transform(bound, 'epsg:4326','epsg:3857') FROM boundtable)) AS pixel
This example is for Sedona on and after v1.0.1. ST_Pixelize returns an array of pixels. You need to use explode to flatten it.
CREATE OR REPLACE TEMP VIEW pixels AS
SELECT pixel, shape FROM pointtable
LATERAL VIEW explode(ST_Pixelize(ST_Transform(shape, 'epsg:4326','epsg:3857'), 256, 256, (SELECT ST_Transform(bound, 'epsg:4326','epsg:3857') FROM boundtable))) AS pixel
This will give you a 256*256 resolution image after you run ST_Render at the end of this tutorial.
Warning
We highly suggest that you should use ST_Transform to transform coordinates to a visualization-specific coordinate system such as epsg:3857, otherwise you map may look distorted.
Aggregate pixels¶
Many objects may be pixelized to the same pixel locations. You now need to aggregate them based on either their spatial aggregation or spatial observations such as temperature or humidity.
CREATE OR REPLACE TEMP VIEW pixelaggregates AS
SELECT pixel, count(*) as weight
FROM pixels
GROUP BY pixel
The weight indicates the degree of spatial aggregation or spatial observations. Later on, it will determine the color of this pixel.
Colorize pixels¶
Run the following command to assign colors for pixels based on their weights.
CREATE OR REPLACE TEMP VIEW pixelaggregates AS
SELECT pixel, ST_Colorize(weight, (SELECT max(weight) FROM pixelaggregates)) as color
FROM pixelaggregates
Please read ST_Colorize for a detailed API description.
Render the image¶
Use ST_Render to plot all pixels on a single image.
CREATE OR REPLACE TEMP VIEW images AS
SELECT ST_Render(pixel, color) AS image, (SELECT ST_AsText(bound) FROM boundtable) AS boundary
FROM pixelaggregates
This DataFrame will contain an Image type column which has only one image.
Store the image on disk¶
Fetch the image from the previous DataFrame
var image = sedona.table("images").take(1)(0)(0).asInstanceOf[ImageSerializableWrapper].getImage
Use Sedona Viz ImageGenerator to store this image on disk.
var imageGenerator = new ImageGenerator
imageGenerator.SaveRasterImageAsLocalFile(image, System.getProperty("user.dir")+"/target/points", ImageType.PNG)
Generate map tiles¶
If you are a map professional, you may need to generate map tiles for different zoom levels and eventually create the map tile layer.
Pixelization and pixel aggregation¶
Please first do pixelization and pixel aggregation using the same commands in single image generation. In ST_Pixelize, you need specify a very high resolution, such as 1000*1000. Note that, each dimension should be divisible by 2^zoom-level
Create tile name¶
Run the following command to compute the tile name for every pixels
CREATE OR REPLACE TEMP VIEW pixelaggregates AS
SELECT pixel, weight, ST_TileName(pixel, 3) AS pid
FROM pixelaggregates
"3" is the zoom level for these map tiles.
Colorize pixels¶
Use the same command explained in single image generation to assign colors.
Render map tiles¶
You now need to group pixels by tiles and then render map tile images in parallel.
CREATE OR REPLACE TEMP VIEW images AS
SELECT ST_Render(pixel, color, 3) AS image
FROM pixelaggregates
GROUP BY pid
"3" is the zoom level for these map tiles.
Store map tiles on disk¶
You can use the same commands in single image generation to fetch all map tiles and store them one by one.