Functions to read geospatial data from a variety of formats into Spark DataFrames.
- spark_read_shapefile: from a shapefile
- spark_read_geojson: from a geojson file
- spark_read_geoparquet: from a geoparquet file
- spark_read_geotiff: from a GeoTiff file, or a folder containing GeoTiff files
Usage
spark_read_shapefile(sc, name = NULL, path = name, options = list(), ...)
spark_read_geojson(
  sc,
  name = NULL,
  path = name,
  options = list(),
  repartition = 0,
  memory = TRUE,
  overwrite = TRUE
)
spark_read_geoparquet(
  sc,
  name = NULL,
  path = name,
  options = list(),
  repartition = 0,
  memory = TRUE,
  overwrite = TRUE
)
spark_read_geotiff(
  sc,
  name = NULL,
  path = name,
  options = list(),
  repartition = 0,
  memory = TRUE,
  overwrite = TRUE
)Arguments
- sc
- A - spark_connection.
- name
- The name to assign to the newly generated table. 
- path
- The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://" and "file://" protocols. 
- options
- A list of strings with additional options. See https://spark.apache.org/docs/latest/sql-programming-guide.html#configuration. 
- ...
- Optional arguments; currently unused. 
- repartition
- The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning. 
- memory
- Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?) 
- overwrite
- Boolean; overwrite the table with the given name if it already exists? 
See also
Other Sedona DF data interface functions: 
spark_write_geojson()
Examples
library(sparklyr)
library(apache.sedona)
sc <- spark_connect(master = "spark://HOST:PORT")
if (!inherits(sc, "test_connection")) {
  input_location <- "/dev/null" # replace it with the path to your input file
  rdd <- spark_read_shapefile(sc, location = input_location)
}
