sedona.spark.utils package
Submodules
sedona.spark.utils.abstract_parser module
- class sedona.spark.utils.abstract_parser.GeometryParser[source]
Bases:
ABC
- classmethod deserialize(bin_parser: BinaryParser) BaseGeometry [source]
- property name
- classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
sedona.spark.utils.adapter module
- class sedona.spark.utils.adapter.Adapter[source]
Bases:
object
Class which allow to convert between Spark DataFrame and SpatialRDD and reverse. This class is used to convert between PySpark DataFrame and SpatialRDD. Schema is lost during the conversion. This should be used if your data starts as a SpatialRDD and you want to convert it to a DataFrame for further processing.
- toDf
Represents a single multimethod.
- classmethod toRdd(dataFrame: DataFrame) JvmSpatialRDD [source]
- toSpatialRdd
Represents a single multimethod.
sedona.spark.utils.binary_parser module
- class sedona.spark.utils.binary_parser.BinaryParser(bytes: bytearray | List[int], current_index=attr_dict['current_index'].default)[source]
Bases:
object
sedona.spark.utils.decorators module
sedona.spark.utils.geometry_adapter module
sedona.spark.utils.geometry_serde module
sedona.spark.utils.geometry_serde_general module
- class sedona.spark.utils.geometry_serde_general.CoordinateType[source]
Bases:
object
Constants used to identify geometry dimensions in the serialized bytearray of geometry.
- BYTES_PER_COORDINATE = [16, 24, 24, 32]
- NUM_COORD_COMPONENTS = [2, 3, 3, 4]
- UNPACK_FORMAT = ['dd', 'ddd', 'ddxxxxxxxx', 'dddxxxxxxxx']
- XY = 1
- XYM = 3
- XYZ = 2
- XYZM = 4
- class sedona.spark.utils.geometry_serde_general.GeometryBuffer(buffer: bytearray, coord_type: int, coords_offset: int, num_coords: int)[source]
Bases:
object
- read_coordinate() Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float] [source]
- read_coordinates(num_coords: int) List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]] [source]
- read_linearring() LineString [source]
- read_linestring() LineString [source]
- write_linestring(line: LineString) None [source]
- class sedona.spark.utils.geometry_serde_general.GeometryTypeID[source]
Bases:
object
Constants used to identify the geometry type in the serialized bytearray of geometry.
- GEOMETRYCOLLECTION = 7
- LINESTRING = 2
- MULTILINESTRING = 5
- MULTIPOINT = 4
- MULTIPOLYGON = 6
- POINT = 1
- POLYGON = 3
- sedona.spark.utils.geometry_serde_general.create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coords: int) bytearray [source]
- sedona.spark.utils.geometry_serde_general.deserialize(buffer: bytes) BaseGeometry | None [source]
Deserialize a shapely geometry object from the internal representation of GeometryUDT. :param buffer: internal representation of GeometryUDT :return: shapely geometry object
- sedona.spark.utils.geometry_serde_general.deserialize_geometry_collection(geom_buffer: GeometryBuffer) GeometryCollection [source]
- sedona.spark.utils.geometry_serde_general.deserialize_linestring(geom_buffer: GeometryBuffer) LineString [source]
- sedona.spark.utils.geometry_serde_general.deserialize_multi_linestring(geom_buffer: GeometryBuffer) MultiLineString [source]
- sedona.spark.utils.geometry_serde_general.deserialize_multi_point(geom_buffer: GeometryBuffer) MultiPoint [source]
- sedona.spark.utils.geometry_serde_general.deserialize_multi_polygon(geom_buffer: GeometryBuffer) MultiPolygon [source]
- sedona.spark.utils.geometry_serde_general.deserialize_point(geom_buffer: GeometryBuffer) Point [source]
- sedona.spark.utils.geometry_serde_general.deserialize_polygon(geom_buffer: GeometryBuffer) Polygon [source]
- sedona.spark.utils.geometry_serde_general.generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) bytes [source]
- sedona.spark.utils.geometry_serde_general.get_coordinate(buffer: bytearray, offset: int, coord_type: int) Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float] [source]
- sedona.spark.utils.geometry_serde_general.get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) ndarray | List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]] [source]
- sedona.spark.utils.geometry_serde_general.put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float]) int [source]
- sedona.spark.utils.geometry_serde_general.put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]]) int [source]
- sedona.spark.utils.geometry_serde_general.serialize(geom: BaseGeometry) bytes | bytearray | None [source]
Serialize a shapely geometry object to the internal representation of GeometryUDT. :param geom: shapely geometry object :return: internal representation of GeometryUDT
- sedona.spark.utils.geometry_serde_general.serialize_geometry_collection(geom: GeometryCollection) bytearray [source]
- sedona.spark.utils.geometry_serde_general.serialize_linestring(geom: LineString) bytes [source]
- sedona.spark.utils.geometry_serde_general.serialize_multi_linestring(geom: MultiLineString) bytes [source]
- sedona.spark.utils.geometry_serde_general.serialize_multi_point(geom: MultiPoint) bytes [source]
- sedona.spark.utils.geometry_serde_general.serialize_multi_polygon(geom: MultiPolygon) bytes [source]
sedona.spark.utils.geomserde_speedup module
Geometry serialization/deserialization module.
- sedona.spark.utils.geomserde_speedup.deserialize()
Deserialize bytes-like object to geometry object.
- sedona.spark.utils.geomserde_speedup.load_libgeos_c()
Load libgeos_c.
- sedona.spark.utils.geomserde_speedup.serialize()
Serialize geometry object as bytearray.
sedona.spark.utils.jvm module
sedona.spark.utils.meta module
- class sedona.spark.utils.meta.MultiDict[source]
Bases:
dict
Special dictionary to build multimethods in a metaclass
- class sedona.spark.utils.meta.MultiMethod(name)[source]
Bases:
object
Represents a single multimethod.
sedona.spark.utils.prep module
sedona.spark.utils.serde module
- class sedona.spark.utils.serde.KryoSerializer[source]
Bases:
Serializer
- getName = 'org.apache.spark.serializer.KryoSerializer'
- class sedona.spark.utils.serde.SedonaKryoRegistrator[source]
Bases:
Serializer
- getName = 'org.apache.sedona.core.serde.SedonaKryoRegistrator'
sedona.spark.utils.spatial_rdd_parser module
- class sedona.spark.utils.spatial_rdd_parser.AbstractSpatialRDDParser[source]
Bases:
ABC
- classmethod deserialize(bin_parser: BinaryParser) BaseGeometry [source]
- class sedona.spark.utils.spatial_rdd_parser.CircleGeometryFactory[source]
Bases:
object
- classmethod geometry_from_bytes(bin_parser: BinaryParser) GeoData [source]
- class sedona.spark.utils.spatial_rdd_parser.GeoData(geom: BaseGeometry, userData: str)[source]
Bases:
object
- property geom
- property userData
- class sedona.spark.utils.spatial_rdd_parser.GeometryFactory[source]
Bases:
object
- classmethod geometry_from_bytes(bin_parser: BinaryParser) GeoData [source]
- class sedona.spark.utils.spatial_rdd_parser.SedonaPickler[source]
Bases:
CloudPickleSerializer
- class sedona.spark.utils.spatial_rdd_parser.SpatialPairRDDParserData[source]
Bases:
AbstractSpatialRDDParser
- classmethod deserialize(bin_parser: BinaryParser)[source]
- name = 'SpatialPairRDDParserData'
- classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
- class sedona.spark.utils.spatial_rdd_parser.SpatialRDDParserData[source]
Bases:
AbstractSpatialRDDParser
- classmethod deserialize(bin_parser: BinaryParser)[source]
- name = 'SpatialRDDParser'
- classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
- class sedona.spark.utils.spatial_rdd_parser.SpatialRDDParserDataMultipleRightGeom[source]
Bases:
AbstractSpatialRDDParser
- classmethod deserialize(bin_parser: BinaryParser)[source]
- name = 'SpatialRDDParser'
- classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
- sedona.spark.utils.spatial_rdd_parser.read_geometry_from_bytes(bin_parser: BinaryParser)[source]
sedona.spark.utils.structured_adapter module
- class sedona.spark.utils.structured_adapter.StructuredAdapter[source]
Bases:
object
Class which allow to convert between Spark DataFrame and SpatialRDD and reverse. This class is used to convert between PySpark DataFrame and SpatialRDD. Schema is lost during the conversion. This should be used if your data starts as a SpatialRDD and you want to convert it to a DataFrame for further processing.
- classmethod pairRddToDf(rawPairRDD: SedonaPairRDD, left_schema: StructType, right_schema: StructType, sparkSession: SparkSession) DataFrame [source]
Convert a raw pair RDD to a DataFrame. This is useful when you have a Spatial join result :param rawPairRDD: :param left_schema: :param right_schema: :param sparkSession:
Returns:
- classmethod toDf(spatialRDD: SpatialRDD, sparkSession: SparkSession) DataFrame [source]
Convert a SpatialRDD to a DataFrame :param spatialRDD: :param sparkSession: :return:
- classmethod toSpatialPartitionedDf(spatialRDD: SpatialRDD, sparkSession: SparkSession) DataFrame [source]
Convert a SpatialRDD to a DataFrame. This DataFrame will be spatially partitioned :param spatialRDD: :param sparkSession: :return:
- classmethod toSpatialRdd(dataFrame: DataFrame, geometryFieldName: str = None) SpatialRDD [source]
Convert a DataFrame to a SpatialRDD :param dataFrame: :param geometryFieldName: :return: