sedona.spark.utils package

Submodules

sedona.spark.utils.abstract_parser module

class sedona.spark.utils.abstract_parser.GeometryParser[source]

Bases: ABC

__init__() None

Method generated by attrs for class GeometryParser.

classmethod deserialize(bin_parser: BinaryParser) BaseGeometry[source]
property name
classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]

sedona.spark.utils.adapter module

class sedona.spark.utils.adapter.Adapter[source]

Bases: object

Class which allow to convert between Spark DataFrame and SpatialRDD and reverse. This class is used to convert between PySpark DataFrame and SpatialRDD. Schema is lost during the conversion. This should be used if your data starts as a SpatialRDD and you want to convert it to a DataFrame for further processing.

toDf

Represents a single multimethod.

classmethod toRdd(dataFrame: DataFrame) JvmSpatialRDD[source]
toSpatialRdd

Represents a single multimethod.

sedona.spark.utils.binary_parser module

class sedona.spark.utils.binary_parser.BinaryBuffer[source]

Bases: object

__init__()[source]
add_empty_bytes(tp: str, number_of_empty)[source]
property byte_array
put(value)[source]
put_byte(value)[source]
put_double(value)[source]
put_int(value)[source]
class sedona.spark.utils.binary_parser.BinaryParser(bytes: bytearray | List[int], current_index=attr_dict['current_index'].default)[source]

Bases: object

__init__(bytes: bytearray | List[int], current_index=attr_dict['current_index'].default) None

Method generated by attrs for class BinaryParser.

read_boolean()[source]
read_byte()[source]
read_char()[source]
read_double()[source]
read_double_reverse()[source]
read_geometry(length: int)[source]
read_int()[source]
read_kryo_string(length: int, sc: SparkContext) str[source]
read_string(length: int, encoding: str = 'utf8')[source]
classmethod remove_negative(byte)[source]
classmethod remove_negatives(bytes)[source]
unpack(tp: str, bytes: bytearray)[source]
unpack_reverse(tp: str, bytes: bytearray)[source]

sedona.spark.utils.decorators module

class sedona.spark.utils.decorators.classproperty(f)[source]

Bases: object

__init__(f)[source]
sedona.spark.utils.decorators.get_first_meet_criteria_element_from_iterable(iterable: Iterable[T], criteria: Callable[[T], int]) int[source]
sedona.spark.utils.decorators.require(library_names: List[str])[source]

sedona.spark.utils.geometry_adapter module

class sedona.spark.utils.geometry_adapter.GeometryAdapter[source]

Bases: object

classmethod create_jvm_geometry_from_base_geometry(jvm, geom: BaseGeometry)[source]
Parameters:
  • jvm

  • geom

Returns:

sedona.spark.utils.geometry_serde module

sedona.spark.utils.geometry_serde.deserialize(buf: bytearray) BaseGeometry | None[source]
sedona.spark.utils.geometry_serde.find_geos_c_dll()[source]

sedona.spark.utils.geometry_serde_general module

class sedona.spark.utils.geometry_serde_general.CoordinateType[source]

Bases: object

Constants used to identify geometry dimensions in the serialized bytearray of geometry.

BYTES_PER_COORDINATE = [16, 24, 24, 32]
NUM_COORD_COMPONENTS = [2, 3, 3, 4]
UNPACK_FORMAT = ['dd', 'ddd', 'ddxxxxxxxx', 'dddxxxxxxxx']
XY = 1
XYM = 3
XYZ = 2
XYZM = 4
static bytes_per_coord(coord_type: int) int[source]
static components_per_coord(coord_type: int) int[source]
static type_of(geom) int[source]
static unpack_format(coord_type: int) str[source]
class sedona.spark.utils.geometry_serde_general.GeometryBuffer(buffer: bytearray, coord_type: int, coords_offset: int, num_coords: int)[source]

Bases: object

__init__(buffer: bytearray, coord_type: int, coords_offset: int, num_coords: int) None[source]
buffer: bytearray
bytes_per_coord: int
coord_type: int
coords_offset: int
ints_offset: int
num_coords: int
read_coordinate() Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float][source]
read_coordinates(num_coords: int) List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]][source]
read_int() int[source]
read_linearring() LineString[source]
read_linestring() LineString[source]
read_polygon() Polygon[source]
write_int(value: int) None[source]
write_linestring(line: LineString) None[source]
write_polygon(polygon: Polygon) None[source]
class sedona.spark.utils.geometry_serde_general.GeometryTypeID[source]

Bases: object

Constants used to identify the geometry type in the serialized bytearray of geometry.

GEOMETRYCOLLECTION = 7
LINESTRING = 2
MULTILINESTRING = 5
MULTIPOINT = 4
MULTIPOLYGON = 6
POINT = 1
POLYGON = 3
sedona.spark.utils.geometry_serde_general.aligned_offset(offset: int) int[source]
sedona.spark.utils.geometry_serde_general.create_buffer_for_geom(geom_type: int, coord_type: int, size: int, num_coords: int) bytearray[source]
sedona.spark.utils.geometry_serde_general.deserialize(buffer: bytes) BaseGeometry | None[source]

Deserialize a shapely geometry object from the internal representation of GeometryUDT. :param buffer: internal representation of GeometryUDT :return: shapely geometry object

sedona.spark.utils.geometry_serde_general.deserialize_geometry_collection(geom_buffer: GeometryBuffer) GeometryCollection[source]
sedona.spark.utils.geometry_serde_general.deserialize_linestring(geom_buffer: GeometryBuffer) LineString[source]
sedona.spark.utils.geometry_serde_general.deserialize_multi_linestring(geom_buffer: GeometryBuffer) MultiLineString[source]
sedona.spark.utils.geometry_serde_general.deserialize_multi_point(geom_buffer: GeometryBuffer) MultiPoint[source]
sedona.spark.utils.geometry_serde_general.deserialize_multi_polygon(geom_buffer: GeometryBuffer) MultiPolygon[source]
sedona.spark.utils.geometry_serde_general.deserialize_point(geom_buffer: GeometryBuffer) Point[source]
sedona.spark.utils.geometry_serde_general.deserialize_polygon(geom_buffer: GeometryBuffer) Polygon[source]
sedona.spark.utils.geometry_serde_general.generate_header_bytes(geom_type: int, coord_type: int, num_coords: int) bytes[source]
sedona.spark.utils.geometry_serde_general.get_coordinate(buffer: bytearray, offset: int, coord_type: int) Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float][source]
sedona.spark.utils.geometry_serde_general.get_coordinates(buffer: bytearray, offset: int, coord_type: int, num_coords: int) ndarray | List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]][source]
sedona.spark.utils.geometry_serde_general.put_coordinate(buffer: bytearray, offset: int, coord_type: int, coord: Tuple[float, float] | Tuple[float, float, float] | Tuple[float, float, float, float]) int[source]
sedona.spark.utils.geometry_serde_general.put_coordinates(buffer: bytearray, offset: int, coord_type: int, coords: List[Tuple[float, float]] | List[Tuple[float, float, float]] | List[Tuple[float, float, float, float]]) int[source]
sedona.spark.utils.geometry_serde_general.serialize(geom: BaseGeometry) bytes | bytearray | None[source]

Serialize a shapely geometry object to the internal representation of GeometryUDT. :param geom: shapely geometry object :return: internal representation of GeometryUDT

sedona.spark.utils.geometry_serde_general.serialize_geometry_collection(geom: GeometryCollection) bytearray[source]
sedona.spark.utils.geometry_serde_general.serialize_linestring(geom: LineString) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_multi_linestring(geom: MultiLineString) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_multi_point(geom: MultiPoint) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_multi_polygon(geom: MultiPolygon) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_point(geom: Point) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_polygon(geom: Polygon) bytes[source]
sedona.spark.utils.geometry_serde_general.serialize_shapely_1_empty_geom(geom: BaseGeometry) bytearray[source]

sedona.spark.utils.geomserde_speedup module

Geometry serialization/deserialization module.

sedona.spark.utils.geomserde_speedup.deserialize()

Deserialize bytes-like object to geometry object.

sedona.spark.utils.geomserde_speedup.load_libgeos_c()

Load libgeos_c.

sedona.spark.utils.geomserde_speedup.serialize()

Serialize geometry object as bytearray.

sedona.spark.utils.jvm module

class sedona.spark.utils.jvm.JvmStorageLevel(jvm, storage_level)[source]

Bases: JvmObject

__init__(jvm, storage_level) None

Method generated by attrs for class JvmStorageLevel.

sedona.spark.utils.meta module

class sedona.spark.utils.meta.GenericMeta[source]

Bases: type

class sedona.spark.utils.meta.MultiDict[source]

Bases: dict

Special dictionary to build multimethods in a metaclass

class sedona.spark.utils.meta.MultiMethod(name)[source]

Bases: object

Represents a single multimethod.

__init__(name)[source]
register(meth)[source]

Register a new method as a multimethod :param meth: :return:

class sedona.spark.utils.meta.MultipleMeta(clsname, bases, clsdict)[source]

Bases: type

Metaclass that allows multiple dispatch of methods

sedona.spark.utils.meta.is_subclass_with_typing(type_a: Any, type_b: Any)[source]

sedona.spark.utils.prep module

sedona.spark.utils.prep.assign_all() bool[source]
sedona.spark.utils.prep.assign_udt_geography()[source]
sedona.spark.utils.prep.assign_udt_raster()[source]
sedona.spark.utils.prep.assign_udt_shapely_objects(geoms: List[type]) bool[source]
sedona.spark.utils.prep.assign_user_data_to_shapely_objects(geoms: List[type]) bool[source]

sedona.spark.utils.serde module

class sedona.spark.utils.serde.KryoSerializer[source]

Bases: Serializer

getName = 'org.apache.spark.serializer.KryoSerializer'
class sedona.spark.utils.serde.SedonaKryoRegistrator[source]

Bases: Serializer

getName = 'org.apache.sedona.core.serde.SedonaKryoRegistrator'
class sedona.spark.utils.serde.Serializer[source]

Bases: ABC

getName

sedona.spark.utils.spatial_rdd_parser module

class sedona.spark.utils.spatial_rdd_parser.AbstractSpatialRDDParser[source]

Bases: ABC

__init__() None

Method generated by attrs for class AbstractSpatialRDDParser.

classmethod deserialize(bin_parser: BinaryParser) BaseGeometry[source]
classmethod serialize(obj: List[Any], binary_buffer: BinaryBuffer) bytearray[source]
class sedona.spark.utils.spatial_rdd_parser.CircleGeometryFactory[source]

Bases: object

__init__() None

Method generated by attrs for class CircleGeometryFactory.

classmethod geometry_from_bytes(bin_parser: BinaryParser) GeoData[source]
classmethod to_bytes(geom: Circle) List[int][source]
class sedona.spark.utils.spatial_rdd_parser.GeoData(geom: BaseGeometry, userData: str)[source]

Bases: object

__init__(geom: BaseGeometry, userData: str)[source]
Parameters:
  • geom

  • userData

property geom
getUserData()[source]
property userData
class sedona.spark.utils.spatial_rdd_parser.GeometryFactory[source]

Bases: object

__init__() None

Method generated by attrs for class GeometryFactory.

classmethod geometry_from_bytes(bin_parser: BinaryParser) GeoData[source]
classmethod to_bytes(geom: BaseGeometry) List[int][source]
class sedona.spark.utils.spatial_rdd_parser.SedonaPickler[source]

Bases: CloudPickleSerializer

__init__()[source]
dumps(obj)[source]

Serialize an object into a byte array. When batching is used, this will be called with an array of objects.

get_parser(number: int)[source]
loads(obj, encoding='bytes')[source]

Deserialize an object from a byte array.

class sedona.spark.utils.spatial_rdd_parser.SpatialPairRDDParserData[source]

Bases: AbstractSpatialRDDParser

__init__() None

Method generated by attrs for class SpatialPairRDDParserData.

classmethod deserialize(bin_parser: BinaryParser)[source]
name = 'SpatialPairRDDParserData'
classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
class sedona.spark.utils.spatial_rdd_parser.SpatialRDDParserData[source]

Bases: AbstractSpatialRDDParser

__init__() None

Method generated by attrs for class SpatialRDDParserData.

classmethod deserialize(bin_parser: BinaryParser)[source]
name = 'SpatialRDDParser'
classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
class sedona.spark.utils.spatial_rdd_parser.SpatialRDDParserDataMultipleRightGeom[source]

Bases: AbstractSpatialRDDParser

__init__() None

Method generated by attrs for class SpatialRDDParserDataMultipleRightGeom.

classmethod deserialize(bin_parser: BinaryParser)[source]
name = 'SpatialRDDParser'
classmethod serialize(obj: BaseGeometry, binary_buffer: BinaryBuffer)[source]
sedona.spark.utils.spatial_rdd_parser.read_geometry_from_bytes(bin_parser: BinaryParser)[source]

sedona.spark.utils.structured_adapter module

class sedona.spark.utils.structured_adapter.StructuredAdapter[source]

Bases: object

Class which allow to convert between Spark DataFrame and SpatialRDD and reverse. This class is used to convert between PySpark DataFrame and SpatialRDD. Schema is lost during the conversion. This should be used if your data starts as a SpatialRDD and you want to convert it to a DataFrame for further processing.

classmethod pairRddToDf(rawPairRDD: SedonaPairRDD, left_schema: StructType, right_schema: StructType, sparkSession: SparkSession) DataFrame[source]

Convert a raw pair RDD to a DataFrame. This is useful when you have a Spatial join result :param rawPairRDD: :param left_schema: :param right_schema: :param sparkSession:

Returns:

classmethod toDf(spatialRDD: SpatialRDD, sparkSession: SparkSession) DataFrame[source]

Convert a SpatialRDD to a DataFrame :param spatialRDD: :param sparkSession: :return:

classmethod toSpatialPartitionedDf(spatialRDD: SpatialRDD, sparkSession: SparkSession) DataFrame[source]

Convert a SpatialRDD to a DataFrame. This DataFrame will be spatially partitioned :param spatialRDD: :param sparkSession: :return:

classmethod toSpatialRdd(dataFrame: DataFrame, geometryFieldName: str = None) SpatialRDD[source]

Convert a DataFrame to a SpatialRDD :param dataFrame: :param geometryFieldName: :return:

sedona.spark.utils.types module

Module contents