Skip to contents

Functions to write geospatial data into a variety of formats from Spark DataFrames.

  • spark_write_geojson: to GeoJSON

  • spark_write_geoparquet: to GeoParquet

  • spark_write_raster: to raster tiles after using RS output functions (RS_AsXXX)

Usage

spark_write_geojson(
  x,
  path,
  mode = NULL,
  options = list(),
  partition_by = NULL,
  ...
)

spark_write_geoparquet(
  x,
  path,
  mode = NULL,
  options = list(),
  partition_by = NULL,
  ...
)

spark_write_raster(
  x,
  path,
  mode = NULL,
  options = list(),
  partition_by = NULL,
  ...
)

Arguments

x

A Spark DataFrame or dplyr operation

path

The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols.

mode

A character element. Specifies the behavior when data or table already exists. Supported values include: 'error', 'append', 'overwrite' and ignore. Notice that 'overwrite' will also change the column structure.

For more details see also https://spark.apache.org/docs/latest/sql-programming-guide.html for your version of Spark.

options

A list of strings with additional options.

partition_by

A character vector. Partitions the output by the given columns on the file system.

...

Optional arguments; currently unused.

See also

Other Sedona DF data interface functions: spark_read_shapefile()

Examples

library(sparklyr)
library(apache.sedona)

sc <- spark_connect(master = "spark://HOST:PORT")

if (!inherits(sc, "test_connection")) {
  tbl <- dplyr::tbl(
    sc,
    dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
  )
  spark_write_geojson(
    tbl %>% dplyr::mutate(id = 1),
    output_location = "/tmp/pts.geojson"
  )
}