Raster data in GeoTiff and ArcInfoAsciiGrid formats can be read into and written from Spark.
Using the RasterUDT
Read
Raster data in GeoTiff and ArcInfo Grid format can be loaded directly into Spark using the sparklyr::spark_read_binary
and Sedona constructors RS_FromGeoTiff
and RS_FromArcInfoAsciiGrid
.
library(dplyr)
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "local")
data_tbl <- spark_read_binary(sc, dir = here::here("/../spark/common/src/test/resources/raster/"), name = "data")
raster <-
data_tbl %>%
mutate(raster = RS_FromGeoTiff(content))
raster
raster %>% sdf_schema()
Once the data is loaded, raster functions are available in dplyr workflows:
Functions taking in raster: Raster
arguments are meant to be used with data loaded with this reader, such as RS_Value
, RS_Values
, RS_Envelope
. Functions taking in Band: Array[Double]
arguments work with data loaded using the Sedona Geotiff DataFrame loader (see below).
For example, getting the number of bands:
raster %>%
mutate(
nbands = RS_NumBands(raster)
) %>%
select(path, nbands) %>%
collect() %>%
mutate(path = path %>% basename())
Or getting values the envelope:
raster %>%
mutate(
env = RS_Envelope(raster) %>% st_astext()
) %>%
select(path, env) %>%
collect() %>%
mutate(path = path %>% basename())
Or getting values at specific points:
Write
To write a Sedona Raster DataFrame to raster files, you need to (1) first convert the Raster DataFrame to a binary DataFrame using RS_AsXXX
functions and (2) then write the binary DataFrame to raster files using Sedona’s built-in raster
data source.
To write a Sedona binary DataFrame to external storage using Sedona’s built-in raster
data source, use the spark_write_raster
function:
dest_file <- tempfile()
raster %>%
mutate(content = RS_AsGeoTiff(raster)) %>%
spark_write_raster(path = dest_file)
dir(dest_file, recursive = TRUE)
Available options see Raster writer:
- rasterField: the binary column to be saved (if there is only one takes that column by default, otherwise specify)
- fileExtension:
.tiff
by default, also accepts.png
,.jpeg
,.asc
- pathField: if used any column name that indicates the paths of each raster file, otherwise random UUIDs are generated.