Skip to content
🎉 Apache Sedona 1.8.1 is now available! Check out the new features and improvements.

Raster loader

Note

Sedona loader are available in Scala, Java and Python and have the same APIs.

Loading raster using the raster data source

The raster data source loads GeoTiff files and automatically splits them into smaller tiles. Each tile is a row in the resulting DataFrame stored in Raster format.

var rawDf = sedona.read.format("raster").load("/some/path/*.tif")
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
Dataset<Row> rawDf = sedona.read().format("raster").load("/some/path/*.tif");
rawDf.createOrReplaceTempView("rawdf");
rawDf.show();
rawDf = sedona.read.format("raster").load("/some/path/*.tif")
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()

The output will look like this:

+--------------------+---+---+----+
|                rast|  x|  y|name|
+--------------------+---+---+----+
|GridCoverage2D["g...|  0|  0| ...|
|GridCoverage2D["g...|  1|  0| ...|
|GridCoverage2D["g...|  2|  0| ...|
...

The output contains the following columns:

  • rast: The raster data in Raster format.
  • x: The 0-based x-coordinate of the tile. This column is only present when retile is not disabled.
  • y: The 0-based y-coordinate of the tile. This column is only present when retile is not disabled.
  • name: The name of the raster file.

The size of the tile is determined by the internal tiling scheme of the raster data. It is recommended to use Cloud Optimized GeoTIFF (COG) format for raster data since they usually organize pixel data as square tiles. You can also disable automatic tiling using option("retile", "false"), or specify the tile size manually using options such as option("tileWidth", "256") and option("tileHeight", "256").

The options for the raster data source are as follows:

  • retile: Whether to enable tiling. Default is true.
  • tileWidth: The width of the tile. If not specified, the size of internal tiles will be used.
  • tileHeight: The height of the tile. If not specified, will use tileWidth if tileWidth is explicitly set, otherwise the size of internal tiles will be used.
  • padWithNoData: Pad the right and bottom of the tile with NODATA values if the tile is smaller than the specified tile size. Default is false.

Note

If the internal tiling scheme of raster data is not friendly for tiling, the raster data source will throw an error, and you can disable automatic tiling using option("retile", "false"), or specify the tile size manually to workaround this issue. A better solution is to translate the raster data into COG format using gdal_translate or other tools.

The raster data source also works with Spark generic file source options, such as option("pathGlobFilter", "*.tif*") and option("recursiveFileLookup", "true"). For instance, you can load all the .tif files recursively in a directory using

sedona.read.format("raster").option("recursiveFileLookup", "true").option(
    "pathGlobFilter", "*.tif*"
).load(path_to_raster_data_folder)

One difference from other file source loaders is that when the loaded path ends with /, the raster data source will look up raster files in the directory and all its subdirectories recursively. This is equivalent to specifying a path without trailing / and setting option("recursiveFileLookup", "true").

Loading raster using binaryFile loader (Deprecated)

The raster loader of Sedona leverages Spark built-in binary data source and works with several RS constructors to produce Raster type. Each raster is a row in the resulting DataFrame and stored in a Raster format.

Tip

After loading rasters, you can quickly visualize them in a Jupyter notebook using SedonaUtils.display_image(df). It automatically detects raster columns and renders them as images. See Raster visualizer docs for details.

By default, these functions uses lon/lat order since v1.5.0. Before, it used lat/lon order.

Step 1: Load raster to a binary DataFrame

You can load any type of raster data using the code below. Then use the RS constructors below to create a Raster DataFrame.

sedona.read.format("binaryFile").load("/some/path/*.asc")

Step 2: Create a raster type column

Use one of the following raster constructors to create a Raster DataFrame:

See the full list of Raster Constructors for more options.