Functions to write geospatial data into a variety of formats from Spark DataFrames.
spark_write_geojson
: to GeoJSONspark_write_geoparquet
: to GeoParquetspark_write_raster
: to raster tiles after using RS output functions (RS_AsXXX
)
Arguments
- x
A Spark DataFrame or dplyr operation
- path
The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols.
- mode
A
character
element. Specifies the behavior when data or table already exists. Supported values include: 'error', 'append', 'overwrite' and ignore. Notice that 'overwrite' will also change the column structure.For more details see also https://spark.apache.org/docs/latest/sql-programming-guide.html for your version of Spark.
- options
A list of strings with additional options.
- partition_by
A
character
vector. Partitions the output by the given columns on the file system.- ...
Optional arguments; currently unused.
See also
Other Sedona DF data interface functions:
spark_read_shapefile()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
tbl <- dplyr::tbl(
sc,
dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
)
spark_write_geojson(
tbl %>% dplyr::mutate(id = 1),
output_location = "/tmp/pts.geojson"
)
}