Skip to contents

Raster data in GeoTiff and ArcInfoAsciiGrid formats can be read into and written from Spark.

Using the RasterUDT

Read

Raster data in GeoTiff and ArcInfo Grid format can be loaded directly into Spark using the sparklyr::spark_read_binary and Sedona constructors RS_FromGeoTiff and RS_FromArcInfoAsciiGrid.

library(dplyr)
library(sparklyr)
library(apache.sedona)

sc <- spark_connect(master = "local")

data_tbl <- spark_read_binary(sc, dir = here::here("/../spark/common/src/test/resources/raster/"), name = "data")

raster <-
  data_tbl %>%
  mutate(raster = RS_FromGeoTiff(content))

raster

raster %>% sdf_schema()

Once the data is loaded, raster functions are available in dplyr workflows:

Functions taking in raster: Raster arguments are meant to be used with data loaded with this reader, such as RS_Value, RS_Values, RS_Envelope. Functions taking in Band: Array[Double] arguments work with data loaded using the Sedona Geotiff DataFrame loader (see below).

For example, getting the number of bands:

raster %>%
  mutate(
    nbands = RS_NumBands(raster)
  ) %>%
  select(path, nbands) %>%
  collect() %>%
  mutate(path = path %>% basename())

Or getting values the envelope:

raster %>%
  mutate(
    env = RS_Envelope(raster) %>% st_astext()
  ) %>%
  select(path, env) %>%
  collect() %>%
  mutate(path = path %>% basename())

Or getting values at specific points:

raster %>%
  mutate(
    val = RS_Value(raster, ST_Point(-13077301.685, 4002565.802))
  ) %>%
  select(path, val) %>%
  collect() %>%
  mutate(path = path %>% basename())

Write

To write a Sedona Raster DataFrame to raster files, you need to (1) first convert the Raster DataFrame to a binary DataFrame using RS_AsXXX functions and (2) then write the binary DataFrame to raster files using Sedona’s built-in raster data source.

To write a Sedona binary DataFrame to external storage using Sedona’s built-in raster data source, use the spark_write_raster function:

dest_file <- tempfile()
raster %>%
  mutate(content = RS_AsGeoTiff(raster)) %>%
  spark_write_raster(path = dest_file)

dir(dest_file, recursive = TRUE)

Available options see Raster writer:

  • rasterField: the binary column to be saved (if there is only one takes that column by default, otherwise specify)
  • fileExtension: .tiff by default, also accepts .png, .jpeg, .asc
  • pathField: if used any column name that indicates the paths of each raster file, otherwise random UUIDs are generated.
dest_file <- tempfile()
raster %>%
  mutate(content = RS_AsArcGrid(raster)) %>%
  spark_write_raster(path = dest_file,
                     options = list("rasterField" = "content",
                                    "fileExtension" = ".asc",
                                    "pathField" = "path"
                     ))

dir(dest_file, recursive = TRUE)