Raster loader
Note
Sedona loader are available in Scala, Java and Python and have the same APIs.
Sedona provides two types of raster DataFrame loaders. They both use Sedona built-in data source but load raster images to different internal formats.
Load any raster to RasterUDT format¶
The raster loader of Sedona leverages Spark built-in binary data source and works with several RS RasterUDT constructors to produce RasterUDT type. Each raster is a row in the resulting DataFrame and stored in a RasterUDT
format.
Load raster to a binary DataFrame¶
You can load any type of raster data using the code below. Then use the RS constructors below to create RasterUDT.
spark.read.format("binaryFile").load("/some/path/*.asc")
RS_FromArcInfoAsciiGrid¶
Introduction: Returns a raster geometry from an Arc Info Ascii Grid file.
Format: RS_FromArcInfoAsciiGrid(asc: Array[Byte])
Since: v1.4.0
Spark SQL example:
var df = spark.read.format("binaryFile").load("/some/path/*.asc")
df = df.withColumn("raster", f.expr("RS_FromArcInfoAsciiGrid(content)"))
RS_FromGeoTiff¶
Introduction: Returns a raster geometry from a GeoTiff file.
Format: RS_FromGeoTiff(asc: Array[Byte])
Since: v1.4.0
Spark SQL example:
var df = spark.read.format("binaryFile").load("/some/path/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))
RS_MakeEmptyRaster¶
Introduction: Returns an empty raster geometry. Every band in the raster is initialized to 0.0
.
Since: v1.4.1
Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, cellSize:Double)
- NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
- Width: The width of the raster in pixels.
- Height: The height of the raster in pixels.
- UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
- UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
- Cell Size (pixel size): The size of the cells in the raster, in terms of the CRS units.
It uses the default Cartesian coordinate system.
Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, scaleX:Double, scaleY:Double, skewX:Double, skewY:Double, srid: Int)
- NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
- Width: The width of the raster in pixels.
- Height: The height of the raster in pixels.
- UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
- UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
- ScaleX (pixel size on X): The size of the cells on the X axis, in terms of the CRS units.
- ScaleY (pixel size on Y): The size of the cells on the Y axis, in terms of the CRS units.
- SkewX: The skew of the raster on the X axis, in terms of the CRS units.
- SkewY: The skew of the raster on the Y axis, in terms of the CRS units.
- SRID: The SRID of the raster. Use 0 if you want to use the default Cartesian coordinate system. Use 4326 if you want to use WGS84.
SQL example 1 (with 2 bands):
SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0) as raster
Output:
+--------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0)|
+--------------------------------------------+
| GridCoverage2D["g...|
+--------------------------------------------+
SQL example 1 (with 2 bands, scale, skew, and SRID):
SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4326) as raster
Output:
+--------------------------------------------------------------+
|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0)|
+--------------------------------------------------------------+
| GridCoverage2D["g...|
+--------------------------------------------------------------+
Load GeoTiff to Array[Double] format¶
Warning
This function has been deprecated since v1.4.1. Please use RS_FromGeoTiff
instead and binaryFile
data source to read GeoTiff files.
The geotiff
loader of Sedona is a Spark built-in data source. It can read a single geotiff image or a number of geotiff images into a DataFrame. Each geotiff is a row in the resulting DataFrame and stored in an array of Double type format.
Since: v1.1.0
Spark SQL example:
The input path could be a path to a single GeoTiff image or a directory of GeoTiff images. You can optionally append an option to drop invalid images. The geometry bound of each image is automatically loaded as a Sedona geometry and is transformed to WGS84 (EPSG:4326) reference system.
var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).load("YOUR_PATH")
geotiffDF.printSchema()
Output:
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- Geometry: string (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nBands: integer (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: double (containsNull = true)
There are three more optional parameters for reading GeoTiff:
|-- readfromCRS: Coordinate reference system of the geometry coordinates representing the location of the Geotiff. An example value of readfromCRS is EPSG:4326.
|-- readToCRS: If you want to transform the Geotiff location geometry coordinates to a different coordinate reference system, you can define the target coordinate reference system with this option.
|-- disableErrorInCRS: (Default value false) => Indicates whether to ignore errors in CRS transformation.
An example with all GeoTiff read options:
var geotiffDF = sparkSession.read.format("geotiff").option("dropInvalid", true).option("readFromCRS", "EPSG:4499").option("readToCRS", "EPSG:4326").option("disableErrorInCRS", true).load("YOUR_PATH")
geotiffDF.printSchema()
Output:
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- Geometry: string (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nBands: integer (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: double (containsNull = true)
You can also select sub-attributes individually to construct a new DataFrame
geotiffDF = geotiffDF.selectExpr("image.origin as origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height as height", "image.width as width", "image.data as data", "image.nBands as bands")
geotiffDF.createOrReplaceTempView("GeotiffDataframe")
geotiffDF.show()
Output:
+--------------------+--------------------+------+-----+--------------------+-----+
| origin| Geom|height|width| data|bands|
+--------------------+--------------------+------+-----+--------------------+-----+
|file:///home/hp/D...|POLYGON ((-58.699...| 32| 32|[1058.0, 1039.0, ...| 4|
|file:///home/hp/D...|POLYGON ((-58.297...| 32| 32|[1258.0, 1298.0, ...| 4|
+--------------------+--------------------+------+-----+--------------------+-----+
RS_Array¶
Introduction: Create an array that is filled by the given value
Format: RS_Array(length:Int, value: Decimal)
Since: v1.1.0
Spark SQL example:
SELECT RS_Array(height * width, 0.0)
RS_GetBand¶
Introduction: Return a particular band from Geotiff Dataframe
The number of total bands can be obtained from the GeoTiff loader
Format: RS_GetBand (allBandValues: Array[Double], targetBand:Int, totalBands:Int)
Since: v1.1.0
Note
Index of targetBand starts from 1 (instead of 0). Index of the first band is 1.
Spark SQL example:
val BandDF = spark.sql("select RS_GetBand(data, 2, Band) as targetBand from GeotiffDataframe")
BandDF.show()
Output:
+--------------------+
| targetBand|
+--------------------+
|[1058.0, 1039.0, ...|
|[1258.0, 1298.0, ...|
+--------------------+