Raster loader


Sedona loader are available in Scala, Java and Python and have the same APIs.

Sedona provides two types of raster DataFrame loaders. They both use Sedona built-in data source but load raster images to different internal formats.

Load any raster to RasterUDT format

The raster loader of Sedona leverages Spark built-in binary data source and works with several RS RasterUDT constructors to produce RasterUDT type. Each raster is a row in the resulting DataFrame and stored in a RasterUDT format.

Load raster to a binary DataFrame

You can load any type of raster data using the code below. Then use the RS constructors below to create RasterUDT."binaryFile").load("/some/path/*.asc")


Introduction: Returns a raster geometry from an Arc Info Ascii Grid file.

Format: RS_FromArcInfoAsciiGrid(asc: Array[Byte])

Since: v1.4.0

Spark SQL example:

var df ="binaryFile").load("/some/path/*.asc")
df = df.withColumn("raster", f.expr("RS_FromArcInfoAsciiGrid(content)"))


Introduction: Returns a raster geometry from a GeoTiff file.

Format: RS_FromGeoTiff(asc: Array[Byte])

Since: v1.4.0

Spark SQL example:

var df ="binaryFile").load("/some/path/*.tiff")
df = df.withColumn("raster", f.expr("RS_FromGeoTiff(content)"))


Introduction: Returns an empty raster geometry. Every band in the raster is initialized to 0.0.

Since: v1.4.1

Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, cellSize:Double)

  • NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
  • Width: The width of the raster in pixels.
  • Height: The height of the raster in pixels.
  • UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
  • UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
  • Cell Size (pixel size): The size of the cells in the raster, in terms of the CRS units.

It uses the default Cartesian coordinate system.

Format: RS_MakeEmptyRaster(numBands:Int, width: Int, height: Int, upperleftX: Double, upperleftY: Double, scaleX:Double, scaleY:Double, skewX:Double, skewY:Double, srid: Int)

  • NumBands: The number of bands in the raster. If not specified, the raster will have a single band.
  • Width: The width of the raster in pixels.
  • Height: The height of the raster in pixels.
  • UpperleftX: The X coordinate of the upper left corner of the raster, in terms of the CRS units.
  • UpperleftY: The Y coordinate of the upper left corner of the raster, in terms of the CRS units.
  • ScaleX (pixel size on X): The size of the cells on the X axis, in terms of the CRS units.
  • ScaleY (pixel size on Y): The size of the cells on the Y axis, in terms of the CRS units.
  • SkewX: The skew of the raster on the X axis, in terms of the CRS units.
  • SkewY: The skew of the raster on the Y axis, in terms of the CRS units.
  • SRID: The SRID of the raster. Use 0 if you want to use the default Cartesian coordinate system. Use 4326 if you want to use WGS84.

SQL example 1 (with 2 bands):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0) as raster


|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0)|
|                        GridCoverage2D["g...|

SQL example 1 (with 2 bands, scale, skew, and SRID):

SELECT RS_MakeEmptyRaster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 4326) as raster


|rs_makeemptyraster(2, 10, 10, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0)|
|                                          GridCoverage2D["g...|

Load GeoTiff to Array[Double] format


This function has been deprecated since v1.4.1. Please use RS_FromGeoTiff instead and binaryFile data source to read GeoTiff files.

The geotiff loader of Sedona is a Spark built-in data source. It can read a single geotiff image or a number of geotiff images into a DataFrame. Each geotiff is a row in the resulting DataFrame and stored in an array of Double type format.

Since: v1.1.0

Spark SQL example:

The input path could be a path to a single GeoTiff image or a directory of GeoTiff images. You can optionally append an option to drop invalid images. The geometry bound of each image is automatically loaded as a Sedona geometry and is transformed to WGS84 (EPSG:4326) reference system.

var geotiffDF ="geotiff").option("dropInvalid", true).load("YOUR_PATH")


 |-- image: struct (nullable = true)
 |    |-- origin: string (nullable = true)
 |    |-- Geometry: string (nullable = true)
 |    |-- height: integer (nullable = true)
 |    |-- width: integer (nullable = true)
 |    |-- nBands: integer (nullable = true)
 |    |-- data: array (nullable = true)
 |    |    |-- element: double (containsNull = true)

There are three more optional parameters for reading GeoTiff:

 |-- readfromCRS: Coordinate reference system of the geometry coordinates representing the location of the Geotiff. An example value of readfromCRS is EPSG:4326.
 |-- readToCRS: If you want to transform the Geotiff location geometry coordinates to a different coordinate reference system, you can define the target coordinate reference system with this option.
 |-- disableErrorInCRS: (Default value false) => Indicates whether to ignore errors in CRS transformation.

An example with all GeoTiff read options:

var geotiffDF ="geotiff").option("dropInvalid", true).option("readFromCRS", "EPSG:4499").option("readToCRS", "EPSG:4326").option("disableErrorInCRS", true).load("YOUR_PATH")


 |-- image: struct (nullable = true)
 |    |-- origin: string (nullable = true)
 |    |-- Geometry: string (nullable = true)
 |    |-- height: integer (nullable = true)
 |    |-- width: integer (nullable = true)
 |    |-- nBands: integer (nullable = true)
 |    |-- data: array (nullable = true)
 |    |    |-- element: double (containsNull = true)

You can also select sub-attributes individually to construct a new DataFrame

geotiffDF = geotiffDF.selectExpr("image.origin as origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height as height", "image.width as width", " as data", "image.nBands as bands")


|              origin|                Geom|height|width|                data|bands|
|file:///home/hp/D...|POLYGON ((-58.699...|    32|   32|[1058.0, 1039.0, ...|    4|
|file:///home/hp/D...|POLYGON ((-58.297...|    32|   32|[1258.0, 1298.0, ...|    4|


Introduction: Create an array that is filled by the given value

Format: RS_Array(length:Int, value: Decimal)

Since: v1.1.0

Spark SQL example:

SELECT RS_Array(height * width, 0.0)


Introduction: Return a particular band from Geotiff Dataframe

The number of total bands can be obtained from the GeoTiff loader

Format: RS_GetBand (allBandValues: Array[Double], targetBand:Int, totalBands:Int)

Since: v1.1.0


Index of targetBand starts from 1 (instead of 0). Index of the first band is 1.

Spark SQL example:

val BandDF = spark.sql("select RS_GetBand(data, 2, Band) as targetBand from GeotiffDataframe")


|          targetBand|
|[1058.0, 1039.0, ...|
|[1258.0, 1298.0, ...|

Last update: June 14, 2023 04:27:38