Raster input and output
Geotiff Dataframe Loader¶
Introduction: The GeoTiff loader of Sedona is a Spark built-in data source. It can read a single geotiff image or a number of geotiff images into a DataFrame.
Since: v1.1.0
Spark SQL example:
The input path could be a path to a single GeoTiff image or a directory of GeoTiff images. You can optionally append an option to drop invalid images. The geometry bound of each image is automatically loaded as a Sedona geometry and is transformed to WGS84 (EPSG:4326) reference system.
var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).load("YOUR_PATH")
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- Geometry: geometry (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nBands: integer (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: double (containsNull = true)
There are three more optional parameters for reading GeoTiff:
|-- readfromCRS: Coordinate reference system of the geometry coordinates representing the location of the Geotiff. An example value of readfromCRS is EPSG:4326.
|-- readToCRS: If you want to tranform the Geotiff location geometry coordinates to a different coordinate reference system, you can define the target coordinate reference system with this option.
|-- disableErrorInCRS: (Default value false) => Indicates whether to ignore errors in CRS transformation.
An example with all GeoTiff read options:
var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).option("readFromCRS", "EPSG:4499").option("readToCRS", "EPSG:4326").option("disableErrorInCRS", true).load("YOUR_PATH")
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- Geometry: geometry (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nBands: integer (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: double (containsNull = true)
You can also select sub-attributes individually to construct a new DataFrame
geotiffDF = geotiffDF.selectExpr("image.origin as origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height as height", "image.width as width", "image.data as data", "image.nBands as bands")
| origin| Geom|height|width| data|bands|
|file:///home/hp/D...|POLYGON ((-58.699...| 32| 32|[1058.0, 1039.0, ...| 4|
|file:///home/hp/D...|POLYGON ((-58.297...| 32| 32|[1258.0, 1298.0, ...| 4|
Introduction: Return a particular band from Geotiff Dataframe
The number of total bands can be obtained from the GeoTiff loader
Format: RS_GetBand (allBandValues: Array[Double], targetBand:Int, totalBands:Int)
Since: v1.1.0
Index of targetBand starts from 1 (instead of 0). Index of the first band is 1.
Spark SQL example:
val BandDF = spark.sql("select RS_GetBand(data, 2, Band) as targetBand from GeotiffDataframe")
| targetBand|
|[1058.0, 1039.0, ...|
|[1258.0, 1298.0, ...|
Introduction: Create an array that is filled by the given value
Format: RS_Array(length:Int, value: Decimal)
Since: v1.1.0
Spark SQL example:
SELECT RS_Array(height * width, 0.0)
Introduction: Return a Base64 String from a geotiff image
Format: RS_Base64 (height:Int, width:Int, redBand: Array[Double], greenBand: Array[Double], blackBand: Array[Double],
optional: alphaBand: Array[Double])
Since: v1.1.0
Spark SQL example:
val BandDF = spark.sql("select RS_Base64(h, w, band1, band2, RS_Array(h*w, 0)) as baseString from dataframe")
| baseString|
Although the 3 RGB bands are mandatory, you can use RS_Array(h*w, 0.0) to create an array (zeroed out, size = h * w) as input.
Introduction: Return a html img tag with the base64 string embedded
Format: RS_HTML(base64:String, optional: width_in_px:String)
Spark SQL example:
df.selectExpr("RS_HTML(encodedstring, '300') as htmlstring" ).show()
| htmlstring|
|<img src="data:im...|
|<img src="data:im...|
Geotiff Dataframe Writer¶
Introduction: You can write a GeoTiff dataframe as GeoTiff images using the spark write
feature with the format geotiff
Since: v1.2.1
Spark SQL example:
The schema of the GeoTiff dataframe to be written can be one of the following two schemas:
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- Geometry: geometry (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nBands: integer (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: double (containsNull = true)
|-- origin: string (nullable = true)
|-- Geometry: geometry (nullable = true)
|-- height: integer (nullable = true)
|-- width: integer (nullable = true)
|-- nBands: integer (nullable = true)
|-- data: array (nullable = true)
| |-- element: double (containsNull = true)
Field names can be renamed, but schema should exactly match with one of the above two schemas. The output path could be a path to a directory where GeoTiff images will be saved. If the directory already exists, write
should be called in overwrite
var dfToWrite = sparkSession.read.format("geotiff").option("dropInvalid", true).option("readToCRS", "EPSG:4326").load("PATH_TO_INPUT_GEOTIFF_IMAGES")
You can override an existing path with the following approach:
You can also extract the columns nested within image
column and write the dataframe as GeoTiff image.
dfToWrite = dfToWrite.selectExpr("image.origin as origin","image.geometry as geometry", "image.height as height", "image.width as width", "image.data as data", "image.nBands as nBands")
If you want the saved GeoTiff images not to be distributed into multiple partitions, you can call coalesce to merge all files in a single partition.
In case, you rename the columns of GeoTiff dataframe, you can set the corresponding column names with the option
parameter. All available optional parameters are listed below:
|-- writeToCRS: (Default value "EPSG:4326") => Coordinate reference system of the geometry coordinates representing the location of the Geotiff.
|-- fieldImage: (Default value "image") => Indicates the image column of GeoTiff DataFrame.
|-- fieldOrigin: (Default value "origin") => Indicates the origin column of GeoTiff DataFrame.
|-- fieldNBands: (Default value "nBands") => Indicates the nBands column of GeoTiff DataFrame.
|-- fieldWidth: (Default value "width") => Indicates the width column of GeoTiff DataFrame.
|-- fieldHeight: (Default value "height") => Indicates the height column of GeoTiff DataFrame.
|-- fieldGeometry: (Default value "geometry") => Indicates the geometry column of GeoTiff DataFrame.
|-- fieldData: (Default value "data") => Indicates the data column of GeoTiff DataFrame.
An example:
dfToWrite = sparkSession.read.format("geotiff").option("dropInvalid", true).option("readToCRS", "EPSG:4326").load("PATH_TO_INPUT_GEOTIFF_IMAGES")
dfToWrite = dfToWrite.selectExpr("image.origin as source","ST_GeomFromWkt(image.geometry) as geom", "image.height as height", "image.width as width", "image.data as data", "image.nBands as bands")
dfToWrite.write.mode("overwrite").format("geotiff").option("writeToCRS", "EPSG:4326").option("fieldOrigin", "source").option("fieldGeometry", "geom").option("fieldNBands", "bands").save("DESTINATION_PATH")