Raster loader

Note

Sedona loader are available in Scala, Java and Python and have the same APIs.

Sedona provides two types of raster DataFrame loaders. They both use Sedona built-in data source but load raster images to different internal formats.

Load any raster to RasterUDT format¶

The raster loader of Sedona leverages Spark built-in binary data source and works with several RS RasterUDT constructors to produce RasterUDT type. Each raster is a row in the resulting DataFrame and stored in a RasterUDT format.

Load raster to a binary DataFrame¶

You can load any type of raster data using the code below. Then use the RS constructors below to create RasterUDT.

spark.read.format("binaryFile").load("/some/path/*.asc")

RS_FromArcInfoAsciiGrid¶

Introduction: Returns a raster geometry from an Arc Info Ascii Grid file.

Format: RS_FromArcInfoAsciiGrid(asc: Array[Byte])

Since: v1.4.0

Spark SQL example:

var df = spark.read.format("binaryFile").load("/some/path/*.asc")
df = df.withColumn("raster", f.expr("RS_FromArcInfoAsciiGrid(content)"))

RS_FromGeoTiff¶

Introduction: Returns a raster geometry from a GeoTiff file.

Format: RS_FromGeoTiff(asc: Array[Byte])

Since: v1.4.0

Spark SQL example:

var df = spark.read.format("binaryFile").load("/some/path/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))

RS_MakeEmptyRaster¶

Introduction: Returns an empty raster geometry. Every band in the raster is initialized to 0.0.

Since: v1.4.1

Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, cellSize:Double)

NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
Width: The width of the raster in pixels.
Height: The height of the raster in pixels.
UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
Cell Size (pixel size): The size of the cells in the raster, in terms of the CRS units.

It uses the default Cartesian coordinate system.

Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, scaleX:Double, scaleY:Double, skewX:Double, skewY:Double, srid: Int)

NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
Width: The width of the raster in pixels.
Height: The height of the raster in pixels.
UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
ScaleX (pixel size on X): The size of the cells on the X axis, in terms of the CRS units.
ScaleY (pixel size on Y): The size of the cells on the Y axis, in terms of the CRS units.
SkewX: The skew of the raster on the X axis, in terms of the CRS units.
SkewY: The skew of the raster on the Y axis, in terms of the CRS units.
SRID: The SRID of the raster. Use 0 if you want to use the default Cartesian coordinate system. Use 4326 if you want to use WGS84.

SQL example 1 (with 2 bands):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0) as raster

Output:

+--------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0)|
+--------------------------------------------+
|                        GridCoverage2D["g...|
+--------------------------------------------+

SQL example 1 (with 2 bands, scale, skew, and SRID):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4326) as raster

Output:

+--------------------------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0)|
+--------------------------------------------------------------+
|                                          GridCoverage2D["g...|
+--------------------------------------------------------------+

Load GeoTiff to Array[Double] format¶

Warning

This function has been deprecated since v1.4.1. Please use RS_FromGeoTiff instead and binaryFile data source to read GeoTiff files.

The geotiff loader of Sedona is a Spark built-in data source. It can read a single geotiff image or a number of geotiff images into a DataFrame. Each geotiff is a row in the resulting DataFrame and stored in an array of Double type format.

Since: v1.1.0

Spark SQL example:

The input path could be a path to a single GeoTiff image or a directory of GeoTiff images. You can optionally append an option to drop invalid images. The geometry bound of each image is automatically loaded as a Sedona geometry and is transformed to WGS84 (EPSG:4326) reference system.

var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).load("YOUR_PATH")
geotiffDF.printSchema()

Output:

 |-- image: struct (nullable = true)
 |    |-- origin: string (nullable = true)
 |    |-- Geometry: string (nullable = true)
 |    |-- height: integer (nullable = true)
 |    |-- width: integer (nullable = true)
 |    |-- nBands: integer (nullable = true)
 |    |-- data: array (nullable = true)
 |    |    |-- element: double (containsNull = true)

There are three more optional parameters for reading GeoTiff:

 |-- readfromCRS: Coordinate reference system of the geometry coordinates representing the location of the Geotiff. An example value of readfromCRS is EPSG:4326.
 |-- readToCRS: If you want to transform the Geotiff location geometry coordinates to a different coordinate reference system, you can define the target coordinate reference system with this option.
 |-- disableErrorInCRS: (Default value false) => Indicates whether to ignore errors in CRS transformation.

An example with all GeoTiff read options:

var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).option("readFromCRS", "EPSG:4499").option("readToCRS", "EPSG:4326").option("disableErrorInCRS", true).load("YOUR_PATH")
geotiffDF.printSchema()

Output:

 |-- image: struct (nullable = true)
 |    |-- origin: string (nullable = true)
 |    |-- Geometry: string (nullable = true)
 |    |-- height: integer (nullable = true)
 |    |-- width: integer (nullable = true)
 |    |-- nBands: integer (nullable = true)
 |    |-- data: array (nullable = true)
 |    |    |-- element: double (containsNull = true)

You can also select sub-attributes individually to construct a new DataFrame

geotiffDF = geotiffDF.selectExpr("image.origin as origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height as height", "image.width as width", "image.data as data", "image.nBands as bands")
geotiffDF.createOrReplaceTempView("GeotiffDataframe")
geotiffDF.show()

Output:

+--------------------+--------------------+------+-----+--------------------+-----+
|              origin|                Geom|height|width|                data|bands|
+--------------------+--------------------+------+-----+--------------------+-----+
|file:///home/hp/D...|POLYGON ((-58.699...|    32|   32|[1058.0, 1039.0, ...|    4|
|file:///home/hp/D...|POLYGON ((-58.297...|    32|   32|[1258.0, 1298.0, ...|    4|
+--------------------+--------------------+------+-----+--------------------+-----+

RS_Array¶

Introduction: Create an array that is filled by the given value

Format: RS_Array(length:Int, value: Decimal)

Since: v1.1.0

Spark SQL example:

SELECT RS_Array(height * width, 0.0)

RS_GetBand¶

Introduction: Return a particular band from Geotiff Dataframe

The number of total bands can be obtained from the GeoTiff loader

Format: RS_GetBand (allBandValues: Array[Double], targetBand:Int, totalBands:Int)

Since: v1.1.0

Note

Index of targetBand starts from 1 (instead of 0). Index of the first band is 1.

Spark SQL example:

val BandDF = spark.sql("select RS_GetBand(data, 2, Band) as targetBand from GeotiffDataframe")
BandDF.show()

Output:

+--------------------+
|          targetBand|
+--------------------+
|[1058.0, 1039.0, ...|
|[1258.0, 1298.0, ...|
+--------------------+

Last update: June 14, 2023 04:27:38