sedona.spark.geopandas package

Subpackages

Submodules

sedona.spark.geopandas.base module

A base class of Sedona/Spark DataFrame/Column to behave like geopandas GeoDataFrame/GeoSeries.

class sedona.spark.geopandas.base.GeoFrame[source]

Bases: object

A base class for both GeoDataFrame and GeoSeries.

property area: pyspark.pandas.Series

Returns a Series containing the area of each geometry in the GeoSeries expressed in the units of the CRS.

Returns:

A Series containing the area of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])])
>>> gs.area
0    1.0
1    4.0
dtype: float64
property boundary

Returns a GeoSeries of lower dimensional objects representing each geometry’s set-theoretic boundary.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.boundary
0    LINESTRING (0 0, 1 1, 0 1, 0 0)
1          MULTIPOINT ((0 0), (1 0))
2           GEOMETRYCOLLECTION EMPTY
dtype: geometry

See also

GeoSeries.exterior

outer boundary (without interior rings)

property bounds: pyspark.pandas.DataFrame

Returns a DataFrame with columns minx, miny, maxx, maxy values containing the bounds for each geometry.

See GeoSeries.total_bounds for the limits of the entire series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.bounds
   minx  miny  maxx  maxy
0   2.0   1.0   2.0   1.0
1   0.0   0.0   1.0   1.0
2   0.0   1.0   1.0   2.0

You can assign the bounds to the GeoDataFrame as:

>>> import pandas as pd
>>> gdf = pd.concat([gdf, gdf.bounds], axis=1)
>>> gdf
                        geometry  minx  miny  maxx  maxy
0                     POINT (2 1)   2.0   1.0   2.0   1.0
1  POLYGON ((0 0, 1 1, 1 0, 0 0))   0.0   0.0   1.0   1.0
2           LINESTRING (0 1, 1 2)   0.0   1.0   1.0   2.0
buffer(distance, resolution=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)[source]

Returns a GeoSeries with all geometries buffered by the specified distance.

Parameters:
  • distance (float) – The distance to buffer by. Negative distances will create inward buffers.

  • resolution (int, default 16) – The resolution of the buffer around each vertex. Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.

  • cap_style (str, default "round") – The style of the buffer cap. One of ‘round’, ‘flat’, ‘square’.

  • join_style (str, default "round") – The style of the buffer join. One of ‘round’, ‘mitre’, ‘bevel’.

  • mitre_limit (float, default 5.0) – The mitre limit ratio for joins when join_style=’mitre’.

  • single_sided (bool, default False) – Whether to create a single-sided buffer. In Sedona, True will default to left-sided buffer. However, ‘right’ may be specified to use a right-sided buffer.

Returns:

A new GeoSeries with buffered geometries.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoDataFrame
>>>
>>> data = {
...     'geometry': [Point(0, 0), Point(1, 1)],
...     'value': [1, 2]
... }
>>> gdf = GeoDataFrame(data)
>>> buffered = gdf.buffer(0.5)
property centroid

Returns a GeoSeries of points representing the centroid of each geometry.

Note that centroid does not have to be on or within original geometry.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.centroid
0    POINT (0.33333 0.66667)
1        POINT (0.70711 0.5)
2                POINT (0 0)
dtype: geometry

See also

GeoSeries.representative_point

point guaranteed to be within each geometry

contains(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that contains other.

An object is said to contain other if at least one point of other lies in the interior and no points of other lie in the exterior of the object. (Therefore, any given polygon does not contain its own boundary - there is not any point that lies in the interior.) If either object is empty, this operation returns False.

This is the inverse of within in the sense that the expression a.contains(b) == b.within(a) always evaluates to True.

Note: Sedona’s implementation instead returns False for identical geometries.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if it is contained.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2    POLYGON ((0 0, 1 2, 0 2, 0 0))
3             LINESTRING (0 0, 0 2)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s.contains(point)
0    False
1     True
2    False
3     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s2.contains(s, align=True)
0    False
1    False
2    False
3     True
4    False
dtype: bool
>>> s2.contains(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries contains any element of the other one.

See also

GeoSeries.contains_properly, GeoSeries.within

contains_properly(other, align=None)[source]
abstractmethod copy() GeoFrameLike[source]
covered_by(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is entirely covered by other.

An object A is said to cover another object B if no points of B lie in the exterior of A.

Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
1                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2                            LINESTRING (1 1, 1.5 1.5)
3                                          POINT (0 0)
dtype: geometry
>>>
>>> s2
1    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2         POLYGON ((0 0, 2 2, 0 2, 0 0))
3                  LINESTRING (0 0, 2 2)
4                            POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries is covered by a single geometry:

>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covered_by(poly)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.covered_by(s2, align=True)
0    False
1     True
2     True
3     True
4    False
dtype: bool
>>> s.covered_by(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is covered_by any element of the other one.

See also

GeoSeries.covers, GeoSeries.overlaps

covers(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is entirely covering other.

An object A is said to cover another object B if no points of B lie in the exterior of A. If either object is empty, this operation returns False.

Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
1         POLYGON ((0 0, 2 2, 0 2, 0 0))
2                  LINESTRING (0 0, 2 2)
3                            POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
3                            LINESTRING (1 1, 1.5 1.5)
4                                          POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries covers a single geometry:

>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covers(poly)
0     True
1    False
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.covers(s2, align=True)
0    False
1    False
2    False
3    False
4    False
dtype: bool
>>> s.covers(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries covers any element of the other one.

See also

GeoSeries.covered_by, GeoSeries.overlaps

crosses(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that cross other.

An object is said to cross other if its interior intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.

Note: Unlike Geopandas, Sedona’s implementation always return NULL when GeometryCollection is involved.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is crossed.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

>>> line = LineString([(-1, 1), (3, 1)])
>>> s.crosses(line)
0     True
1     True
2     True
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.crosses(s2, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.crosses(s2, align=False)
0     True
1     True
2    False
3    False
dtype: bool

Notice that a line does not cross a point that it contains.

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

See also

GeoSeries.disjoint, GeoSeries.intersects

difference(other, align=None)[source]

Returns a GeoSeries of the points in each aligned geometry that are not in other.

The operation works on a 1-to-1 row-wise manner:

Unlike Geopandas, Sedona does not support this operation for GeometryCollections.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the difference to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s2.difference(point)
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5           GEOMETRYCOLLECTION EMPTY
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.difference(s2, align=True)
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
5                       POINT (0 1)
dtype: geometry
>>> s.difference(s2, align=False)
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2           GEOMETRYCOLLECTION EMPTY
3             LINESTRING (2 0, 0 2)
4           GEOMETRYCOLLECTION EMPTY
dtype: geometry

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is different from any element of the other one.

See also

GeoSeries.intersection

distance(other, align=None)[source]

Returns a Series containing the distance to aligned other.

The operation works on a 1-to-1 row-wise manner:

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (float)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 0), (1, 1)]),
...         Polygon([(0, 0), (-1, 0), (-1, 1)]),
...         LineString([(1, 1), (0, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0      POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, -1 0, -1 1, 0 0))
2               LINESTRING (1 1, 0 0)
3                         POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                                          POINT (3 1)
3                                LINESTRING (1 0, 2 0)
4                                          POINT (0 1)
dtype: geometry

We can check the distance of each geometry of GeoSeries to a single geometry:

>>> point = Point(-1, 0)
>>> s.distance(point)
0    1.0
1    0.0
2    1.0
3    1.0
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using align=True or ignore index and use elements based on their matching order using align=False:

>>> s.distance(s2, align=True)
0         NaN
1    0.707107
2    2.000000
3    1.000000
4         NaN
dtype: float64
>>> s.distance(s2, align=False)
0    0.000000
1    3.162278
2    0.707107
3    1.000000
dtype: float64
dwithin(other, distance, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is within a set distance from other.

The operation works on a 1-to-1 row-wise manner:

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.

  • distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be applied to all geometries. An array or Series will be applied elementwise. If np.array or pd.Series are used then it must have same length as the GeoSeries.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(1, 0), (4, 2), (2, 2)]),
...         Polygon([(2, 0), (3, 2), (2, 2)]),
...         LineString([(2, 0), (2, 2)]),
...         Point(1, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((1 0, 4 2, 2 2, 1 0))
2    POLYGON ((2 0, 3 2, 2 2, 2 0))
3             LINESTRING (2 0, 2 2)
4                       POINT (1 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s2.dwithin(point, 1.8)
1     True
2    False
3    False
4     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.dwithin(s2, distance=1, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.dwithin(s2, distance=1, align=False)
0     True
1    False
2    False
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within the set distance of any element of the other one.

See also

GeoSeries.within

property envelope

Returns a GeoSeries of geometries representing the envelope of each geometry.

The envelope of a geometry is the bounding rectangle. That is, the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains the geometry.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2         MULTIPOINT ((0 0), (1 1))
3                       POINT (0 0)
dtype: geometry
>>> s.envelope
0    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
2    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
3                            POINT (0 0)
dtype: geometry

See also

GeoSeries.convex_hull

convex hull geometry

property geom_type

Returns a series of strings specifying the geometry type of each geometry of each object.

Note: Unlike Geopandas, Sedona returns LineString instead of LinearRing.

Returns:

A Series containing the geometry type of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon, Point
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Point(0, 0)])
>>> gs.geom_type
0    POLYGON
1    POINT
dtype: object
get_geometry(index)[source]

Returns the n-th geometry from a collection of geometries (0-indexed).

If the index is non-negative, it returns the geometry at that index. If the index is negative, it counts backward from the end of the collection (e.g., -1 returns the last geometry). Returns None if the index is out of bounds.

Note: Simple geometries act as length-1 collections

Note: Using Shapely < 2.0, may lead to different results for empty simple geometries due to how shapely interprets them.

Parameters:

index (int or array_like) – Position of a geometry to be retrieved within its collection

Return type:

GeoSeries

Notes

Simple geometries act as collections of length 1. Any out-of-range index value returns None.

Examples

>>> from shapely.geometry import Point, MultiPoint, GeometryCollection
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 0),
...         MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]),
...         GeometryCollection(
...             [MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), Point(0, 1)]
...         ),
...         Polygon(),
...         GeometryCollection(),
...     ]
... )
>>> s
0                                          POINT (0 0)
1              MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
2    GEOMETRYCOLLECTION (MULTIPOINT ((0 0), (1 1), ...
3                                        POLYGON EMPTY
4                             GEOMETRYCOLLECTION EMPTY
dtype: geometry
>>> s.get_geometry(0)
0                                POINT (0 0)
1                                POINT (0 0)
2    MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
3                              POLYGON EMPTY
4                                       None
dtype: geometry
>>> s.get_geometry(1)
0           None
1    POINT (1 1)
2    POINT (0 1)
3           None
4           None
dtype: geometry
>>> s.get_geometry(-1)
0    POINT (0 0)
1    POINT (1 0)
2    POINT (0 1)
3  POLYGON EMPTY
4           None
dtype: geometry
property has_sindex

Check the existence of the spatial index without generating it.

Use the .sindex attribute on a GeoDataFrame or GeoSeries to generate a spatial index if it does not yet exist, which may take considerable time based on the underlying index implementation.

Note that the underlying spatial index may not be fully initialized until the first use.

Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.

Examples

>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(x, x) for x in range(5)])
>>> s.has_sindex
False
>>> index = s.sindex
>>> s.has_sindex
True
Returns:

True if the spatial index has been generated or False if not.

Return type:

bool

property has_z

Returns a Series of dtype('bool') with value True for features that have a z-component.

Notes

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries(
...     [
...         Point(0, 1),
...         Point(0, 1, 2),
...     ]
... )
>>> s
0        POINT (0 1)
1    POINT Z (0 1 2)
dtype: geometry
>>> s.has_z
0    False
1     True
dtype: bool
intersection(other, align=None)[source]

Returns a GeoSeries of the intersection of points in each aligned geometry with other.

The operation works on a 1-to-1 row-wise manner.

Note: Unlike most functions, intersection may return the unordered with respect to the index. If this is important to you, you may call sort_index() on the result.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the intersection with.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can also do intersection of each geometry and a single shapely geometry:

>>> s.intersection(Polygon([(0, 0), (1, 1), (0, 1)]))
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2             LINESTRING (0 0, 1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.intersection(s2, align=True)
0                              None
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2                       POINT (1 1)
3             LINESTRING (2 0, 0 2)
4                       POINT EMPTY
5                              None
dtype: geometry
>>> s.intersection(s2, align=False)
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1             LINESTRING (1 1, 1 2)
2                       POINT (1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

See also

GeoSeries.difference, GeoSeries.symmetric_difference, GeoSeries.union

intersection_all()[source]
intersects(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that intersects other.

An object is said to intersect other if its boundary and interior intersects in any way with those of the other.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is intersected.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

>>> line = LineString([(-1, 1), (3, 1)])
>>> s.intersects(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.intersects(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.intersects(s2, align=False)
0    True
1    True
2    True
3    True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

See also

GeoSeries.disjoint, GeoSeries.crosses, GeoSeries.touches, GeoSeries.intersection

property is_empty

Returns a Series of dtype('bool') with value True for empty geometries.

Examples

An example of a GeoDataFrame with one empty point, one point and one missing value:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> geoseries = GeoSeries([Point(), Point(2, 1), None], crs="EPSG:4326")
>>> geoseries
0  POINT EMPTY
1  POINT (2 1)
2         None
>>> geoseries.is_empty
0     True
1    False
2    False
dtype: bool

See also

GeoSeries.isna

detect missing geometries

property is_ring

Return a Series of dtype('bool') with value True for features that are closed.

When constructing a LinearRing, the sequence of coordinates may be explicitly closed by passing identical values in the first and last indices. Otherwise, the sequence will be implicitly closed by copying the first tuple to the last index.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import LineString, LinearRing
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1)]),
...         LineString([(0, 0), (1, 1), (1, -1), (0, 0)]),
...         LinearRing([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0         LINESTRING (0 0, 1 1, 1 -1)
1    LINESTRING (0 0, 1 1, 1 -1, 0 0)
2    LINEARRING (0 0, 1 1, 1 -1, 0 0)
dtype: geometry
>>> s.is_ring
0    False
1     True
2     True
dtype: bool
property is_simple

Returns a Series of dtype('bool') with value True for geometries that do not cross themselves.

This is meaningful only for LineStrings and LinearRings.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import LineString
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0    LINESTRING (0 0, 1 1, 1 -1, 0 1)
1         LINESTRING (0 0, 1 1, 1 -1)
dtype: geometry
>>> s.is_simple
0    False
1     True
dtype: bool
property is_valid

Returns a Series of dtype('bool') with value True for geometries that are valid.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid
0     True
1    False
2     True
3    False
dtype: bool

See also

GeoSeries.is_valid_reason

reason for invalidity

is_valid_reason()[source]

Returns a Series of strings with the reason for invalidity of each geometry.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         Polygon([(0, 0), (2, 0), (1, 1), (2, 2), (0, 2), (1, 1), (0, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid_reason()
0    Valid Geometry
1    Self-intersection at or near point (0.5, 0.5, NaN)
2    Valid Geometry
3    Ring Self-intersection at or near point (1.0, 1.0)
4    None
dtype: object

See also

GeoSeries.is_valid

detect invalid geometries

GeoSeries.make_valid

fix invalid geometries

property length

Returns a Series containing the length of each geometry in the GeoSeries.

In the case of a (Multi)Polygon it measures the length of its exterior (i.e. perimeter).

For a GeometryCollection it measures sums the values for each of the individual geometries.

Returns:

A Series containing the length of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)]), GeometryCollection([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)])])])
>>> gs.length
0    0.000000
1    1.414214
2    3.414214
3    4.828427
dtype: float64
make_valid(*, method='linework', keep_collapsed=True)[source]

Repairs invalid geometries.

Returns a GeoSeries with valid geometries.

If the input geometry is already valid, then it will be preserved. In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same type to be made valid, then a multi-part geometry will be returned (e.g. a MultiPolygon). If the geometry must be split into multiple parts of different types to be made valid, then a GeometryCollection will be returned.

In Sedona, only the ‘structure’ method is available:

  • the ‘structure’ algorithm tries to reason from the structure of the input to find the ‘correct’ repair: exterior rings bound area, interior holes exclude area. It first makes all rings valid, then shells are merged and holes are subtracted from the shells to generate valid result. It assumes that holes and shells are correctly categorized in the input geometry.

Parameters:
  • method ({'linework', 'structure'}, default 'linework') – Algorithm to use when repairing geometry. Sedona Geopandas only supports the ‘structure’ method. The default method is “linework” to match compatibility with Geopandas, but it must be explicitly set to ‘structure’ to use the Sedona implementation.

  • keep_collapsed (bool, default True) – For the ‘structure’ method, True will keep components that have collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import MultiPolygon, Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]),
...         Polygon([(0, 2), (0, 1), (2, 0), (0, 0), (0, 2)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...     ],
... )
>>> s
0    POLYGON ((0 0, 0 2, 1 1, 2 2, 2 0, 1 1, 0 0))
1              POLYGON ((0 2, 0 1, 2 0, 0 0, 0 2))
2                       LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
>>> s.make_valid()
0    MULTIPOLYGON (((1 1, 0 0, 0 2, 1 1)), ((2 0, 1...
1                       POLYGON ((0 1, 2 0, 0 0, 0 1))
2                           LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
overlaps(other, align=None)[source]

Returns True for all aligned geometries that overlap other, else False.

In the original Geopandas, Geometries overlap if they have more than one but not all points in common, have the same dimension, and the intersection of the interiors of the geometries has the same dimension as the geometries themselves.

However, in Sedona, we return True in the case where the geometries points match.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if overlaps.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (0, 2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 3)]),
...         Point(0, 1),
...     ],
... )

We can check if each geometry of GeoSeries overlaps a single geometry:

>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> s.overlaps(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We align both GeoSeries based on index values and compare elements with the same index.

>>> s.overlaps(s2)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.overlaps(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries overlaps any element of the other one.

See also

GeoSeries.crosses, GeoSeries.intersects

abstractmethod plot(*args, **kwargs)[source]
segmentize(max_segment_length)[source]

Returns a GeoSeries with vertices added to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment. Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float | array-like) – Additional vertices will be added so that all line segments are no longer than this value. Must be greater than 0.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (0, 10)]),
...         Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]),
...     ],
... )
>>> s
0                     LINESTRING (0 0, 0 10)
1    POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))
dtype: geometry
>>> s.segmentize(max_segment_length=5)
0                          LINESTRING (0 0, 0 5, 0 10)
1    POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0...
dtype: geometry
simplify(tolerance=None, preserve_topology=True)[source]

Returns a GeoSeries containing a simplified representation of each geometry.

The algorithm (Douglas-Peucker) recursively splits the original line into smaller parts and connects these parts’ endpoints by a straight line. Then, it removes all points whose distance to the straight line is smaller than tolerance. It does not move any points and it always preserves endpoints of the original line or polygon. See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify for details

Simplifies individual geometries independently, without considering the topology of a potential polygonal coverage. If you would like to treat the GeoSeries as a coverage and simplify its edges, while preserving the coverage topology, see simplify_coverage().

Parameters:
  • tolerance (float) – All parts of a simplified geometry will be no more than tolerance distance from the original. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.

  • preserve_topology (bool (default True)) – False uses a quicker algorithm, but may produce self-intersecting or otherwise invalid geometries.

Notes

Invalid geometric objects may result from simplification that does not preserve topology and simplification may be sensitive to the order of coordinates: two geometries differing only in order of coordinates may be simplified differently.

See also

simplify_coverage

simplify geometries using coverage simplification

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point, LineString
>>> s = GeoSeries(
...     [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])]
... )
>>> s
0    POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19...
1                         LINESTRING (0 0, 1 10, 0 20)
dtype: geometry
>>> s.simplify(1)
0    POLYGON ((0 1, 0 -1, -1 0, 0 1))
1              LINESTRING (0 0, 0 20)
dtype: geometry
property sindex: SpatialIndex

Returns a spatial index for the GeoSeries.

Note that the spatial index may not be fully initialized until the first use.

Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.

Returns:

The spatial index.

Return type:

SpatialIndex

Examples

>>> from shapely.geometry import Point, box
>>> from sedona.spark.geopandas import GeoSeries
>>>
>>> s = GeoSeries([Point(x, x) for x in range(5)])
>>> s.sindex.query(box(1, 1, 3, 3))
[Point(1, 1), Point(2, 2), Point(3, 3)]
>>> s.has_sindex
True
snap(other, tolerance, align=None)[source]

Snap the vertices and segments of the geometry to vertices of the reference.

Vertices and segments of the input geometry are snapped to vertices of the reference geometry, returning a new geometry; the input geometries are not modified. The result geometry is the input geometry with the vertices and segments snapped. If no snapping occurs then the input geometry is returned unchanged. The tolerance is used to control where snapping is performed.

Where possible, this operation tries to avoid creating invalid geometries; however, it does not guarantee that output geometries will be valid. It is the responsibility of the caller to check for and handle invalid geometries.

Because too much snapping can result in invalid geometries being created, heuristics are used to determine the number and location of snapped vertices that are likely safe to snap. These heuristics may omit some potential snaps that are otherwise within the tolerance.

Note: Sedona’s result may differ slightly from geopandas’s snap() result because of small differences between the underlying engines being used.

The operation works in a 1-to-1 row-wise manner:

Parameters:
  • other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to snap to.

  • tolerance (float or array like) – Maximum distance between vertices that shall be snapped

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Point(0.5, 2.5),
...         LineString([(0.1, 0.1), (0.49, 0.51), (1.01, 0.89)]),
...         Polygon([(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)]),
...     ],
... )
>>> s
0                               POINT (0.5 2.5)
1    LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2       POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))
dtype: geometry
>>> s2 = GeoSeries(
...     [
...         Point(0, 2),
...         LineString([(0, 0), (0.5, 0.5), (1.0, 1.0)]),
...         Point(8, 10),
...     ],
...     index=range(1, 4),
... )
>>> s2
1                       POINT (0 2)
2    LINESTRING (0 0, 0.5 0.5, 1 1)
3                      POINT (8 10)
dtype: geometry

We can snap each geometry to a single shapely geometry:

>>> s.snap(Point(0, 2), tolerance=1)
0                                     POINT (0 2)
1      LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0 0, 0 2, 0 10, 10 10, 10 0, 0 0))
dtype: geometry

We can also snap two GeoSeries to each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and snap elements with the same index using align=True or ignore index and snap elements based on their matching order using align=False:

>>> s.snap(s2, tolerance=1, align=True)
0                                                 None
1           LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0.5 0.5, 1 1, 0 10, 10 10, 10 0, 0.5...
3                                                 None
dtype: geometry
>>> s.snap(s2, tolerance=1, align=False)
0                                      POINT (0 2)
1                   LINESTRING (0 0, 0.5 0.5, 1 1)
2    POLYGON ((0 0, 0 10, 8 10, 10 10, 10 0, 0 0))
dtype: geometry
abstractmethod to_geopandas() GeoSeries | GeoDataFrame[source]
to_parquet(path, **kwargs)[source]
property total_bounds

Returns a tuple containing minx, miny, maxx, maxy values for the bounds of the series as a whole.

See GeoSeries.bounds for the bounds of the geometries contained in the series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.total_bounds
array([ 0., -1.,  3.,  2.])
touches(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that touches other.

An object is said to touch other if it has at least one point in common with other and its interior does not intersect with any part of the other. Overlapping features therefore do not touch.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is touched.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (-2, 0), (0, -2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3         MULTIPOINT ((0 0), (0 1))
dtype: geometry
>>> s2
1    POLYGON ((0 0, -2 0, 0 -2, 0 0))
2               LINESTRING (0 1, 1 1)
3               LINESTRING (1 1, 3 0)
4                         POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries touches a single geometry:

>>> line = LineString([(0, 0), (-1, -2)])
>>> s.touches(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.touches(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.touches(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries touches any element of the other one.

See also

GeoSeries.overlaps, GeoSeries.intersects

abstract property type
union_all(method='unary', grid_size=None) BaseGeometry[source]

Returns a geometry containing the union of all geometries in the GeoSeries.

Sedona does not support the method or grid_size argument, so the user does not need to manually decide the algorithm being used.

Parameters:
  • method (str (default "unary")) – Not supported in Sedona.

  • grid_size (float, default None) – Not supported in Sedona.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import box
>>> s = GeoSeries([box(0, 0, 1, 1), box(0, 0, 2, 2)])
>>> s
0    POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1    POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0))
dtype: geometry
>>> s.union_all()
<POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))>
within(other, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is within other.

An object is said to be within other if at least one of its points is located in the interior and no points are located in the exterior of the other. If either object is empty, this operation returns False.

This is the inverse of contains in the sense that the expression a.within(b) == b.contains(a) always evaluates to True.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections and for geometries that are equal.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if each geometry is within.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 1 2, 0 2, 0 0))
2             LINESTRING (0 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (0 0, 0 2)
3             LINESTRING (0 0, 0 1)]
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries is within a single geometry:

>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)])
>>> s.within(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s2.within(s)
0    False
1    False
2     True
3    False
4    False
dtype: bool
>>> s2.within(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within any element of the other one.

See also

GeoSeries.contains

sedona.spark.geopandas.geodataframe module

class sedona.spark.geopandas.geodataframe.GeoDataFrame(*args: Any, **kwargs: Any)[source]

Bases: GeoFrame, DataFrame

A pandas-on-Spark DataFrame for geospatial data with geometry columns.

GeoDataFrame extends pyspark.pandas.DataFrame to provide geospatial operations using Apache Sedona’s spatial functions. It maintains compatibility with GeoPandas GeoDataFrame while operating on distributed datasets.

Parameters:
  • data (dict, array-like, DataFrame, or GeoDataFrame) – Data to initialize the GeoDataFrame. Can be a dictionary, array-like structure, pandas DataFrame, GeoPandas GeoDataFrame, or another GeoDataFrame.

  • geometry (str, array-like, or GeoSeries, optional) – Column name, array of geometries, or GeoSeries to use as the active geometry. If None, will look for existing geometry columns.

  • crs (pyproj.CRS, optional) – Coordinate Reference System for the geometries.

  • columns (Index or array-like, optional) – Column labels to use for the resulting frame.

  • index (Index or array-like, optional) – Index to use for the resulting frame.

Examples

>>> from shapely.geometry import Point, Polygon
>>> from sedona.spark.geopandas import GeoDataFrame
>>> import pandas as pd
>>>
>>> # Create from dictionary with geometry
>>> data = {
...     'name': ['A', 'B', 'C'],
...     'geometry': [Point(0, 0), Point(1, 1), Point(2, 2)]
... }
>>> gdf = GeoDataFrame(data, crs='EPSG:4326')
>>> gdf
     name   geometry
0       A   POINT (0 0)
1       B   POINT (1 1)
2       C   POINT (2 2)
>>>
>>> # Spatial operations
>>> buffered = gdf.buffer(0.1)
>>> buffered.area
0    0.031416
1    0.031416
2    0.031416
dtype: float64
>>>
>>> # Spatial joins
>>> polygons = GeoDataFrame({
...     'region': ['Region1', 'Region2'],
...     'geometry': [
...         Polygon([(-1, -1), (1, -1), (1, 1), (-1, 1)]),
...         Polygon([(0.5, 0.5), (2.5, 0.5), (2.5, 2.5), (0.5, 2.5)])
...     ]
... })
>>> result = gdf.sjoin(polygons, how='left', predicate='within')
>>> result['region']
0    Region1
1    Region2
2    Region2
dtype: object

Notes

This implementation differs from GeoPandas in several ways: - Uses Spark for distributed processing - Geometries are stored in WKB (Well-Known Binary) format internally - Some methods may have different performance characteristics - Not all GeoPandas methods are implemented yet (see IMPLEMENTATION_STATUS)

Performance Considerations: - Operations are distributed across Spark cluster - Avoid converting to GeoPandas (.to_geopandas()) on large datasets - Use .sample() for testing with large datasets - Spatial joins are optimized for distributed processing

Geometry Column Management: - Supports multiple geometry columns - One geometry column is designated as ‘active’ at a time - Active geometry is used for spatial operations and plotting - Use set_geometry() to change the active geometry column

See also

geopandas.GeoDataFrame

The GeoPandas equivalent

sedona.spark.geopandas.GeoSeries

Series with geometry data

__init__(data=None, index=None, columns=None, dtype=None, copy=False, geometry: Any | None = None, crs: Any | None = None, **kwargs)[source]
property active_geometry_name: Any

Return the name of the active geometry column

Returns a name if a GeoDataFrame has an active geometry column set, otherwise returns None. The return type is usually a string, but may be an integer, tuple or other hashable, depending on the contents of the dataframe columns.

You can also access the active geometry column using the .geometry property. You can set a GeoSeries to be an active geometry using the set_geometry() method.

Returns:

name of an active geometry column or None

Return type:

str or other index label supported by pandas

See also

GeoDataFrame.set_geometry

set the active geometry

copy(deep=False) GeoDataFrame[source]

Make a copy of this GeoDataFrame object.

Parameters:

deep (bool, default False) – This parameter is not supported but just a dummy parameter to match pandas.

Returns:

A copy of this GeoDataFrame object.

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoDataFrame
>>> gdf = GeoDataFrame([{"geometry": Point(1, 1), "value1": 2, "value2": 3}])
>>> gdf_copy = gdf.copy()
>>> print(gdf_copy)
   geometry  value1  value2
0  POINT (1 1)       2       3
property crs
classmethod from_arrow(table, geometry: str | None = None, to_pandas_kwargs: dict | None = None)[source]

Construct a GeoDataFrame from a Arrow table object based on GeoArrow extension types.

See https://geoarrow.org/ for details on the GeoArrow specification.

This functions accepts any tabular Arrow object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_array__ or __arrow_c_stream__ method).

Added in version 1.0.

Parameters:
  • table (pyarrow.Table or Arrow-compatible table) – Any tabular object implementing the Arrow PyCapsule Protocol (i.e. has an __arrow_c_array__ or __arrow_c_stream__ method). This table should have at least one column with a geoarrow geometry type.

  • geometry (str, default None) – The name of the geometry column to set as the active geometry column. If None, the first geometry column found will be used.

  • to_pandas_kwargs (dict, optional) – Arguments passed to the pa.Table.to_pandas method for non-geometry columns. This can be used to control the behavior of the conversion of the non-geometry columns to a pandas DataFrame. For example, you can use this to control the dtype conversion of the columns. By default, the to_pandas method is called with no additional arguments.

Return type:

GeoDataFrame

See also

GeoDataFrame.to_arrow, GeoSeries.from_arrow

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> import geoarrow.pyarrow as ga  # requires: pip install geoarrow-pyarrow
>>> import pyarrow as pa  # requires: pip install pyarrow
>>> table = pa.Table.from_arrays([
...     ga.as_geoarrow([None, "POLYGON ((0 0, 1 1, 0 1, 0 0))", "LINESTRING (0 0, -1 1, 0 -1)"]),
...     pa.array([1, 2, 3]),
...     pa.array(["a", "b", "c"]),
... ], names=["geometry", "id", "value"])
>>> gdf = GeoDataFrame.from_arrow(table)
>>> gdf
                           geometry   id  value
0                              None    1      a
1    POLYGON ((0 0, 1 1, 0 1, 0 0))    2      b
2      LINESTRING (0 0, -1 1, 0 -1)    3      c
classmethod from_dict(data: dict, geometry=None, crs: Any | None = None, **kwargs) GeoDataFrame[source]
classmethod from_features(features, crs: Any | None = None, columns: Iterable[str] | None = None) GeoDataFrame[source]
classmethod from_file(filename: str, format: str | None = None, **kwargs) GeoDataFrame[source]

Alternate constructor to create a GeoDataFrame from a file.

Parameters:
  • filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in that directory.

  • format (str, optional) – The format of the file to read, by default None. If None, Sedona infers the format from the file extension. Note that format inference is not supported for directories. Available formats are “shapefile”, “geojson”, “geopackage”, and “geoparquet”.

  • table_name (str, optional) – The name of the table to read from a GeoPackage file, by default None. This is required if format is “geopackage”.

  • **kwargs – Additional keyword arguments passed to the file reader.

Returns:

A new GeoDataFrame created from the file.

Return type:

GeoDataFrame

See also

GeoDataFrame.to_file

Write a GeoDataFrame to a file.

classmethod from_postgis(sql: str | sqlalchemy.text, con, geom_col: str = 'geom', crs: Any | None = None, index_col: str | list[str] | None = None, coerce_float: bool = True, parse_dates: list | dict | None = None, params: list | tuple | dict | None = None, chunksize: int | None = None) GeoDataFrame[source]
property geometry: GeoSeries

Geometry data for GeoDataFrame

iterfeatures(na: str = 'null', show_bbox: bool = False, drop_id: bool = False) Generator[dict][source]
plot(*args, **kwargs)[source]

Plot a GeoDataFrame.

Generate a plot of a GeoDataFrame with matplotlib. If a column is specified, the plot coloring will be based on values in that column.

Note: This method is not scalable and requires collecting all data to the driver.

Parameters:
  • column (str, np.array, pd.Series, pd.Index (default None)) – The name of the dataframe column, np.array, pd.Series, or pd.Index to be plotted. If np.array, pd.Series, or pd.Index are used then it must have same length as dataframe. Values are used to color the plot. Ignored if color is also set.

  • kind (str) –

    The kind of plots to produce. The default is to create a map (“geo”). Other supported kinds of plots from pandas:

    • ’line’ : line plot

    • ’bar’ : vertical bar plot

    • ’barh’ : horizontal bar plot

    • ’hist’ : histogram

    • ’box’ : BoxPlot

    • ’kde’ : Kernel Density Estimation plot

    • ’density’ : same as ‘kde’

    • ’area’ : area plot

    • ’pie’ : pie plot

    • ’scatter’ : scatter plot

    • ’hexbin’ : hexbin plot.

  • cmap (str (default None)) – The name of a colormap recognized by matplotlib.

  • color (str, np.array, pd.Series (default None)) – If specified, all objects will be colored uniformly.

  • ax (matplotlib.pyplot.Artist (default None)) – axes on which to draw the plot

  • cax (matplotlib.pyplot Artist (default None)) – axes on which to draw the legend in case of color map.

  • categorical (bool (default False)) – If False, cmap will reflect numerical values of the column being plotted. For non-numerical columns, this will be set to True.

  • legend (bool (default False)) – Plot a legend. Ignored if no column is given, or if color is given.

  • scheme (str (default None)) – Name of a choropleth classification scheme (requires mapclassify). A mapclassify.MapClassifier object will be used under the hood. Supported are all schemes provided by mapclassify (e.g. ‘BoxPlot’, ‘EqualInterval’, ‘FisherJenks’, ‘FisherJenksSampled’, ‘HeadTailBreaks’, ‘JenksCaspall’, ‘JenksCaspallForced’, ‘JenksCaspallSampled’, ‘MaxP’, ‘MaximumBreaks’, ‘NaturalBreaks’, ‘Quantiles’, ‘Percentiles’, ‘StdMean’, ‘UserDefined’). Arguments can be passed in classification_kwds.

  • k (int (default 5)) – Number of classes (ignored if scheme is None)

  • vmin (None or float (default None)) – Minimum value of cmap. If None, the minimum data value in the column to be plotted is used.

  • vmax (None or float (default None)) – Maximum value of cmap. If None, the maximum data value in the column to be plotted is used.

  • markersize (str or float or sequence (default None)) – Only applies to point geometries within a frame. If a str, will use the values in the column of the frame specified by markersize to set the size of markers. Otherwise can be a value to apply to all points, or a sequence of the same length as the number of points.

  • figsize (tuple of integers (default None)) – Size of the resulting matplotlib.figure.Figure. If the argument axes is given explicitly, figsize is ignored.

  • legend_kwds (dict (default None)) –

    Keyword arguments to pass to matplotlib.pyplot.legend() or matplotlib.pyplot.colorbar(). Additional accepted keywords when scheme is specified:

    fmtstring

    A formatting specification for the bin edges of the classes in the legend. For example, to have no decimals: {"fmt": "{:.0f}"}.

    labelslist-like

    A list of legend labels to override the auto-generated labels. Needs to have the same number of elements as the number of classes (k).

    intervalboolean (default False)

    An option to control brackets from mapclassify legend. If True, open/closed interval brackets are shown in the legend.

  • categories (list-like) – Ordered list-like object of categories to be used for categorical plot.

  • classification_kwds (dict (default None)) – Keyword arguments to pass to mapclassify

  • missing_kwds (dict (default None)) – Keyword arguments specifying color options (as style_kwds) to be passed on to geometries with missing values in addition to or overwriting other style kwds. If None, geometries with missing values are not plotted.

  • aspect ('auto', 'equal', None or float (default 'auto')) – Set aspect of axis. If ‘auto’, the default aspect for map plots is ‘equal’; if however data are not projected (coordinates are long/lat), the aspect is by default set to 1/cos(df_y * pi/180) with df_y the y coordinate of the middle of the GeoDataFrame (the mean of the y range of bounding box) so that a long/lat square appears square in the middle of the plot. This implies an Equirectangular projection. If None, the aspect of ax won’t be changed. It can also be set manually (float) as the ratio of y-unit to x-unit.

  • autolim (bool (default True)) – Update axes data limits to contain the new geometries.

  • **style_kwds (dict) – Style options to be passed on to the actual plot function, such as edgecolor, facecolor, linewidth, markersize, alpha.

Returns:

ax

Return type:

matplotlib axes instance

Examples

>>> import geodatasets  # requires: pip install geodatasets
>>> import geopandas as gpd
>>> df = gpd.read_file(geodatasets.get_path("nybb"))
>>> df.head()
BoroCode  ...                                           geometry
0         5  ...  MULTIPOLYGON (((970217.022 145643.332, 970227....
1         4  ...  MULTIPOLYGON (((1029606.077 156073.814, 102957...
2         3  ...  MULTIPOLYGON (((1021176.479 151374.797, 102100...
3         1  ...  MULTIPOLYGON (((981219.056 188655.316, 980940....
4         2  ...  MULTIPOLYGON (((1012821.806 229228.265, 101278...
>>> df.plot("BoroName", cmap="Set1")
rename_geometry(col: str, inplace: Literal[True] = False) None[source]
rename_geometry(col: str, inplace: Literal[False] = False) GeoDataFrame

Renames the GeoDataFrame geometry column to the specified name. By default yields a new object.

The original geometry column is replaced with the input.

Parameters:
  • col (new geometry column label)

  • inplace (boolean, default False) – Modify the GeoDataFrame in place (without creating a new object)

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> df = GeoDataFrame(d, crs="EPSG:4326")
>>> df1 = df.rename_geometry('geom1')
>>> df1.geometry.name
'geom1'
>>> df.rename_geometry('geom1', inplace=True)
>>> df.geometry.name
'geom1'

See also

GeoDataFrame.set_geometry

set the active geometry

set_crs(crs, inplace=False, allow_override=True)[source]

Set the Coordinate Reference System (CRS) of the GeoDataFrame.

If there are multiple geometry columns within the GeoDataFrame, only the CRS of the active geometry column is set.

Pass None to remove CRS from the active geometry column.

Notes

The underlying geometries are not transformed to this CRS. To transform the geometries to a new CRS, use the to_crs method.

Parameters:
  • crs (pyproj.CRS | None, optional) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional) – EPSG code specifying the projection.

  • inplace (bool, default False) – If True, the CRS of the GeoDataFrame will be changed in place (while still returning the result) instead of making a copy of the GeoDataFrame.

  • allow_override (bool, default True) – If the GeoDataFrame already has a CRS, allow to replace the existing CRS, even when both are not equal. In Sedona, setting this to True will lead to eager evaluation instead of lazy evaluation. Unlike Geopandas, True is the default value in Sedona for performance reasons.

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = GeoDataFrame(d)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)

Setting CRS to a GeoDataFrame without one:

>>> gdf.crs is None
True
>>> gdf = gdf.set_crs('epsg:3857')
>>> gdf.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Overriding existing CRS:

>>> gdf = gdf.set_crs(4326, allow_override=True)

Without allow_override=True, set_crs returns an error if you try to override CRS.

See also

GeoDataFrame.to_crs

re-project to another CRS

set_geometry(col, drop: bool | None = None, inplace: Literal[True] = False, crs: Any | None = None) None[source]
set_geometry(col, drop: bool | None = None, inplace: Literal[False] = False, crs: Any | None = None) GeoDataFrame

Set the GeoDataFrame geometry using either an existing column or the specified input. By default yields a new object.

The original geometry column is replaced with the input.

Parameters:
  • col (column label or array-like) – An existing column name or values to set as the new geometry column. If values (array-like, (Geo)Series) are passed, then if they are named (Series) the new geometry column will have the corresponding name, otherwise the existing geometry column will be replaced. If there is no existing geometry column, the new geometry column will use the default name “geometry”.

  • drop (boolean, default False) –

    When specifying a named Series or an existing column name for col, controls if the previous geometry column should be dropped from the result. The default of False keeps both the old and new geometry column.

    Deprecated since version 1.0.0.

  • inplace (boolean, default False) – Modify the GeoDataFrame in place (do not create a new object)

  • crs (pyproj.CRS, optional) – Coordinate system to use. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string. If passed, overrides both DataFrame and col’s crs. Otherwise, tries to get crs from passed col values or DataFrame.

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = GeoDataFrame(d, crs="EPSG:4326")
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)

Passing an array:

>>> df1 = gdf.set_geometry([Point(0,0), Point(1,1)])
>>> df1
    col1     geometry
0  name1  POINT (0 0)
1  name2  POINT (1 1)

Using existing column:

>>> gdf["buffered"] = gdf.buffer(2)
>>> df2 = gdf.set_geometry("buffered")
>>> df2.geometry
0    POLYGON ((3 2, 2.99037 1.80397, 2.96157 1.6098...
1    POLYGON ((4 1, 3.99037 0.80397, 3.96157 0.6098...
Name: buffered, dtype: geometry
Return type:

GeoDataFrame

See also

GeoDataFrame.rename_geometry

rename an active geometry column

sjoin(other, how='inner', predicate='intersects', lsuffix='left', rsuffix='right', distance=None, on_attribute=None, **kwargs)[source]

Spatial join of two GeoDataFrames.

Parameters:
  • other (GeoDataFrame) – The right GeoDataFrame to join with.

  • how (str, default 'inner') – The type of join: * ‘left’: use keys from left_df; retain only left_df geometry column * ‘right’: use keys from right_df; retain only right_df geometry column * ‘inner’: use intersection of keys from both dfs; retain only left_df geometry column

  • predicate (str, default 'intersects') – Binary predicate. Valid values: ‘intersects’, ‘contains’, ‘within’, ‘dwithin’

  • lsuffix (str, default 'left') – Suffix to apply to overlapping column names (left GeoDataFrame).

  • rsuffix (str, default 'right') – Suffix to apply to overlapping column names (right GeoDataFrame).

  • distance (float, optional) – Distance for ‘dwithin’ predicate. Required if predicate=’dwithin’.

  • on_attribute (str, list or tuple, optional) – Column name(s) to join on as an additional join restriction. These must be found in both DataFrames.

  • **kwargs – Additional keyword arguments passed to the spatial join function.

Returns:

A GeoDataFrame with the results of the spatial join.

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import Point, Polygon
>>> from sedona.spark.geopandas import GeoDataFrame
>>> polygons = GeoDataFrame({
...     'geometry': [Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])],
...     'value': [1]
... })
>>> points = GeoDataFrame({
...     'geometry': [Point(0.5, 0.5), Point(2, 2)],
...     'value': [1, 2]
... })
>>> joined = points.sjoin(polygons)
>>> joined
    geometry_left  value_left            geometry_right  value_right
0  POINT (0.5 0.5)           1  POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))            1
to_arrow(*, index: bool | None = None, geometry_encoding='WKB', interleaved: bool = True, include_z: bool | None = None)[source]

Encode a GeoDataFrame to GeoArrow format. See https://geoarrow.org/ for details on the GeoArrow specification. This function returns a generic Arrow data object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_stream__ method). This object can then be consumed by your Arrow implementation of choice that supports this protocol. .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html

Note: Requires geopandas versions >= 1.0.0 to use with Sedona.

Parameters:
  • index (bool, default None) –

    If True, always include the dataframe’s index(es) as columns in the file output. If False, the index(es) will not be written to the file. If None, the index(ex) will be included as columns in the file output except RangeIndex which is stored as metadata only.

    Note: Unlike in geopandas, None will include the index in the column because Sedona always converts RangeIndex into a general Index.

  • geometry_encoding ({'WKB', 'geoarrow' }, default 'WKB') – The GeoArrow encoding to use for the data conversion.

  • interleaved (bool, default True) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’ coordinates are interleaved in a single fixed size list array. If False, the coordinates are stored as separate arrays in a struct type.

  • include_z (bool, default None) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality of the individual geometries is preserved). If False, return 2D geometries. If True, include the third dimension in the output (if a geometry has no third dimension, the z-coordinates will be NaN). By default, will infer the dimensionality from the input geometries. Note that this inference can be unreliable with empty geometries (for a guaranteed result, it is recommended to specify the keyword).

Returns:

A generic Arrow table object with geometry columns encoded to GeoArrow.

Return type:

ArrowTable

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> from shapely.geometry import Point
>>> data = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = GeoDataFrame(data)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> arrow_table = gdf.to_arrow(index=False)
>>> arrow_table
<geopandas.io._geoarrow.ArrowTable object at ...>
The returned data object needs to be consumed by a library implementing
the Arrow PyCapsule Protocol. For example, wrapping the data as a
pyarrow.Table (requires pyarrow >= 14.0):
>>> import pyarrow as pa  # requires: pip install pyarrow
>>> table = pa.table(arrow_table)
>>> table
pyarrow.Table
col1: string
geometry: binary
----
col1: [["name1","name2"]]
geometry: [[0101000000000000000000F03F0000000000000040,01010000000000000000000040000000000000F03F]]
to_crs(crs: Any | None = None, epsg: int | None = None, inplace: bool = False) GeoDataFrame | None[source]

Transform geometries to a new coordinate reference system.

Transform all geometries in an active geometry column to a different coordinate reference system. The crs attribute on the current GeoSeries must be set. Either crs or epsg may be specified for output.

This method will transform all points in all objects. It has no notion of projecting entire geometries. All segments joining points are assumed to be lines in the current projection, not geodesics. Objects crossing the dateline (or other projection boundary) will have undesirable behavior.

Parameters:
  • crs (pyproj.CRS, optional if epsg is specified) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional if crs is specified) – EPSG code specifying output projection.

  • inplace (bool, optional, default: False) – Whether to return a new GeoDataFrame or do the transformation in place.

Return type:

GeoDataFrame

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoDataFrame
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = GeoDataFrame(d, crs=4326)
>>> gdf
    col1     geometry
0  name1  POINT (1 2)
1  name2  POINT (2 1)
>>> gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
>>> gdf = gdf.to_crs(3857)
>>> gdf
    col1                       geometry
0  name1  POINT (111319.491 222684.209)
1  name2  POINT (222638.982 111325.143)
>>> gdf.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

See also

GeoDataFrame.set_crs

assign CRS without re-projection

to_feather(path, index: bool | None = None, compression: str | None = None, schema_version=None, **kwargs)[source]
to_file(path: str, driver: str | None = None, schema: dict | None = None, index: bool | None = None, **kwargs)[source]

Write the GeoDataFrame to a file.

Parameters:
  • path (str) – File path or file handle to write to.

  • driver (str, default None) –

    The format driver used to write the file. If not specified, it attempts to infer it from the file extension. If no extension is specified, Sedona will error.

    Options: “geojson”, “geopackage”, “geoparquet”

  • schema (dict, default None) – Not applicable to Sedona’s implementation.

  • index (bool, default None) – If True, write index into one or more columns (for MultiIndex). Default None writes the index into one or more columns only if the index is named, is a MultiIndex, or has a non-integer data type. If False, no index is written.

  • **kwargs

    Additional keyword arguments:

    modestr, default ‘w’

    The write mode, ‘w’ to overwrite the existing file and ‘a’ to append. ‘overwrite’ and ‘append’ are equivalent to ‘w’ and ‘a’ respectively.

    crspyproj.CRS, default None

    If specified, the CRS is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the CRS based on the crs attribute. The value can be anything accepted by pyproj.CRS.from_user_input, such as an authority string (e.g., “EPSG:4326”) or a WKT string.

    enginestr

    Not applicable to Sedona’s implementation.

    metadatadict[str, str], default None

    Optional metadata to be stored in the file. Keys and values must be strings. Supported only for “GPKG” driver. Not supported by Sedona.

Examples

>>> from shapely.geometry import Point, LineString
>>> from sedona.spark.geopandas import GeoDataFrame
>>> gdf = GeoDataFrame({
...     "geometry": [Point(0, 0), LineString([(0, 0), (1, 1)])],
...     "int": [1, 2]
... })
>>> gdf.to_file("output.parquet", driver="geoparquet")

With selected drivers you can also append to a file with mode="a":

>>> gdf.to_file("output.geojson", driver="geojson", mode="a")

When the index is of non-integer dtype, index=None (default) is treated as True, writing the index to the file.

>>> gdf = GeoDataFrame({"geometry": [Point(0, 0), Point(1, 1)]}, index=["a", "b"])
>>> gdf.to_file("output_with_index.parquet", driver="geoparquet")
to_geo_dict(na: str | None = 'null', show_bbox: bool = False, drop_id: bool = False) dict[source]
to_geopandas() GeoDataFrame[source]

Note: Unlike in pandas and geopandas, Sedona will always return a general Index. This differs from pandas and geopandas, which will return a RangeIndex by default.

e.g pd.Index([0, 1, 2]) instead of pd.RangeIndex(start=0, stop=3, step=1)

to_json(na: Literal['null', 'drop', 'keep'] = 'null', show_bbox: bool = False, drop_id: bool = False, to_wgs84: bool = False, **kwargs) str[source]

Returns a GeoJSON representation of the GeoDataFrame as a string.

Parameters:
  • na ({'null', 'drop', 'keep'}, default 'null') – Dictates how to represent missing (NaN) values in the output. - null: Outputs missing entries as JSON null. - drop: Removes the entire property from a feature if its value is missing. - keep: Outputs missing entries as NaN.

  • show_bbox (bool, default False) – If True, the bbox (bounds) of the geometries is included in the output.

  • drop_id (bool, default False) – If True, the GeoDataFrame index is not written to the ‘id’ field of each GeoJSON Feature.

  • to_wgs84 (bool, default False) – If True, all geometries are transformed to WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification. When False, the current CRS is exported if it’s set.

  • **kwargs – Additional keyword arguments passed to json.dumps().

Returns:

A GeoJSON representation of the GeoDataFrame.

Return type:

str

See also

GeoDataFrame.to_file

Write a GeoDataFrame to a file, which can be used for GeoJSON format.

Examples

>>> from sedona.spark.geopandas import GeoDataFrame
>>> from shapely.geometry import Point
>>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
>>> gdf = GeoDataFrame(d, crs="EPSG:3857")
>>> gdf.to_json()
'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"col1": "name1"}, "geometry": {"type": "Point", "coordinates": [1.0, 2.0]}}, {"id": "1", "type": "Feature", "properties": {"col1": "name2"}, "geometry": {"type": "Point", "coordinates": [2.0, 1.0]}}], "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:EPSG::3857"}}}'

See also

GeoDataFrame.to_file

write GeoDataFrame to file

to_parquet(path, **kwargs)[source]

Write the GeoDataFrame to a GeoParquet file.

Parameters:
  • path (str) – The file path where the GeoParquet file will be written.

  • **kwargs – Additional arguments to pass to the Sedona DataFrame output function.

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoDataFrame
>>> gdf = GeoDataFrame({"geometry": [Point(0, 0), Point(1, 1)], "value": [1, 2]})
>>> gdf.to_parquet("output.parquet")
to_spark_pandas() pyspark.pandas.DataFrame[source]

Convert the GeoDataFrame to a Spark Pandas DataFrame.

to_wkb(hex: bool = False, **kwargs) DataFrame[source]
to_wkt(**kwargs) DataFrame[source]
property type
exception sedona.spark.geopandas.geodataframe.MissingGeometryColumnError[source]

Bases: Exception

sedona.spark.geopandas.geoseries module

class sedona.spark.geopandas.geoseries.GeoSeries(*args: Any, **kwargs: Any)[source]

Bases: GeoFrame, Series

A pandas-on-Spark Series for geometric/spatial operations.

GeoSeries extends pyspark.pandas.Series to provide spatial operations using Apache Sedona’s spatial functions. It maintains compatibility with GeoPandas GeoSeries while operating on distributed datasets.

Parameters:
  • data (array-like, Iterable, dict, or scalar value) – Contains the data for the GeoSeries. Can be geometries, WKB bytes, or other GeoSeries/GeoDataFrame objects.

  • index (array-like or Index (1d), optional) – Values must be hashable and have the same length as data.

  • crs (pyproj.CRS, optional) – Coordinate Reference System for the geometries.

  • dtype (dtype, optional) – Data type for the GeoSeries.

  • name (str, optional) – Name of the GeoSeries.

  • copy (bool, default False) – Whether to copy the input data.

Examples

>>> from shapely.geometry import Point, Polygon
>>> from sedona.spark.geopandas import GeoSeries
>>>
>>> # Create from geometries
>>> s = GeoSeries([Point(0, 0), Point(1, 1)], crs='EPSG:4326')
>>> s
0    POINT (0 0)
1    POINT (1 1)
dtype: geometry
>>>
>>> # Spatial operations
>>> s.buffer(0.1).area
0    0.031416
1    0.031416
dtype: float64
>>>
>>> # CRS operations
>>> s_utm = s.to_crs('EPSG:32633')
>>> s_utm.crs
<Projected CRS: EPSG:32633>
Name: WGS 84 / UTM zone 33N
...

Notes

This implementation differs from GeoPandas in several ways: - Uses Spark for distributed processing - Geometries are stored in WKB (Well-Known Binary) format internally - Some methods may have different performance characteristics - Not all GeoPandas methods are implemented yet (see IMPLEMENTATION_STATUS)

Performance Considerations: - Operations are distributed across Spark cluster - Avoid calling .to_geopandas() on large datasets - Use .sample() for testing with large datasets

See also

geopandas.GeoSeries

The GeoPandas equivalent

sedona.spark.geopandas.GeoDataFrame

DataFrame with geometry column

__init__(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False, crs=None, **kwargs)[source]

Initialize a GeoSeries object.

Parameters: - data: The input data for the GeoSeries. It can be a GeoDataFrame, GeoSeries, or pandas Series. - index: The index for the GeoSeries. - crs: Coordinate Reference System for the GeoSeries. - dtype: Data type for the GeoSeries. - name: Name of the GeoSeries. - copy: Whether to copy the input data. - fastpath: Internal parameter for fast initialization.

Examples

>>> from shapely.geometry import Point
>>> import geopandas as gpd
>>> import pandas as pd
>>> from sedona.spark.geopandas import GeoSeries

# Example 1: Initialize with GeoDataFrame >>> gdf = gpd.GeoDataFrame({‘geometry’: [Point(1, 1), Point(2, 2)]}) >>> gs = GeoSeries(data=gdf) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) Name: geometry, dtype: geometry

# Example 2: Initialize with GeoSeries >>> gseries = gpd.GeoSeries([Point(1, 1), Point(2, 2)]) >>> gs = GeoSeries(data=gseries) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) dtype: geometry

# Example 3: Initialize with pandas Series >>> pseries = pd.Series([Point(1, 1), Point(2, 2)]) >>> gs = GeoSeries(data=pseries) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) dtype: geometry

property area: pyspark.pandas.Series

Returns a Series containing the area of each geometry in the GeoSeries expressed in the units of the CRS.

Returns:

A Series containing the area of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])])
>>> gs.area
0    1.0
1    4.0
dtype: float64
property boundary: GeoSeries

Returns a GeoSeries of lower dimensional objects representing each geometry’s set-theoretic boundary.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.boundary
0    LINESTRING (0 0, 1 1, 0 1, 0 0)
1          MULTIPOINT ((0 0), (1 0))
2           GEOMETRYCOLLECTION EMPTY
dtype: geometry

See also

GeoSeries.exterior

outer boundary (without interior rings)

property bounds: pyspark.pandas.DataFrame

Returns a DataFrame with columns minx, miny, maxx, maxy values containing the bounds for each geometry.

See GeoSeries.total_bounds for the limits of the entire series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.bounds
   minx  miny  maxx  maxy
0   2.0   1.0   2.0   1.0
1   0.0   0.0   1.0   1.0
2   0.0   1.0   1.0   2.0

You can assign the bounds to the GeoDataFrame as:

>>> import pandas as pd
>>> gdf = pd.concat([gdf, gdf.bounds], axis=1)
>>> gdf
                        geometry  minx  miny  maxx  maxy
0                     POINT (2 1)   2.0   1.0   2.0   1.0
1  POLYGON ((0 0, 1 1, 1 0, 0 0))   0.0   0.0   1.0   1.0
2           LINESTRING (0 1, 1 2)   0.0   1.0   1.0   2.0
buffer(distance, resolution=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs) GeoSeries[source]

Returns a GeoSeries with all geometries buffered by the specified distance.

Parameters:
  • distance (float) – The distance to buffer by. Negative distances will create inward buffers.

  • resolution (int, default 16) – The resolution of the buffer around each vertex. Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.

  • cap_style (str, default "round") – The style of the buffer cap. One of ‘round’, ‘flat’, ‘square’.

  • join_style (str, default "round") – The style of the buffer join. One of ‘round’, ‘mitre’, ‘bevel’.

  • mitre_limit (float, default 5.0) – The mitre limit ratio for joins when join_style=’mitre’.

  • single_sided (bool, default False) – Whether to create a single-sided buffer. In Sedona, True will default to left-sided buffer. However, ‘right’ may be specified to use a right-sided buffer.

Returns:

A new GeoSeries with buffered geometries.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoDataFrame
>>>
>>> data = {
...     'geometry': [Point(0, 0), Point(1, 1)],
...     'value': [1, 2]
... }
>>> gdf = GeoDataFrame(data)
>>> buffered = gdf.buffer(0.5)
property centroid: GeoSeries

Returns a GeoSeries of points representing the centroid of each geometry.

Note that centroid does not have to be on or within original geometry.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2                       POINT (0 0)
dtype: geometry
>>> s.centroid
0    POINT (0.33333 0.66667)
1        POINT (0.70711 0.5)
2                POINT (0 0)
dtype: geometry

See also

GeoSeries.representative_point

point guaranteed to be within each geometry

clip(mask, keep_geom_type: bool = False, sort=False) GeoSeries[source]
concave_hull(ratio=0.0, allow_holes=False)[source]
contains(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that contains other.

An object is said to contain other if at least one point of other lies in the interior and no points of other lie in the exterior of the object. (Therefore, any given polygon does not contain its own boundary - there is not any point that lies in the interior.) If either object is empty, this operation returns False.

This is the inverse of within in the sense that the expression a.contains(b) == b.within(a) always evaluates to True.

Note: Sedona’s implementation instead returns False for identical geometries.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if it is contained.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2    POLYGON ((0 0, 1 2, 0 2, 0 0))
3             LINESTRING (0 0, 0 2)
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s.contains(point)
0    False
1     True
2    False
3     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s2.contains(s, align=True)
0    False
1    False
2    False
3     True
4    False
dtype: bool
>>> s2.contains(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries contains any element of the other one.

contains_properly(other, align=None)[source]
property convex_hull
copy(deep=False)[source]

Make a copy of this GeoSeries object.

Parameters:

deep (bool, default False) – If True, a deep copy of the data is made. Otherwise, a shallow copy is made.

Returns:

A copy of this GeoSeries object.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Point(1, 1), Point(2, 2)])
>>> gs_copy = gs.copy()
>>> print(gs_copy)
0    POINT (1 1)
1    POINT (2 2)
dtype: geometry
count_coordinates()[source]
count_geometries()[source]
count_interior_rings()[source]
covered_by(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is entirely covered by other.

An object A is said to cover another object B if no points of B lie in the exterior of A.

Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
1                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2                            LINESTRING (1 1, 1.5 1.5)
3                                          POINT (0 0)
dtype: geometry
>>>
>>> s2
1    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
2         POLYGON ((0 0, 2 2, 0 2, 0 0))
3                  LINESTRING (0 0, 2 2)
4                            POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries is covered by a single geometry:

>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covered_by(poly)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.covered_by(s2, align=True)
0    False
1     True
2     True
3     True
4    False
dtype: bool
>>> s.covered_by(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is covered_by any element of the other one.

covers(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is entirely covering other.

An object A is said to cover another object B if no points of B lie in the exterior of A. If either object is empty, this operation returns False.

Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
...         LineString([(1, 1), (1.5, 1.5)]),
...         Point(0, 0),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
1         POLYGON ((0 0, 2 2, 0 2, 0 0))
2                  LINESTRING (0 0, 2 2)
3                            POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                  POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))
3                            LINESTRING (1 1, 1.5 1.5)
4                                          POINT (0 0)
dtype: geometry

We can check if each geometry of GeoSeries covers a single geometry:

>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
>>> s.covers(poly)
0     True
1    False
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.covers(s2, align=True)
0    False
1    False
2    False
3    False
4    False
dtype: bool
>>> s.covers(s2, align=False)
0     True
1    False
2     True
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries covers any element of the other one.

crosses(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that cross other.

An object is said to cross other if its interior intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.

Note: Unlike Geopandas, Sedona’s implementation always return NULL when GeometryCollection is involved.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is crossed.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

>>> line = LineString([(-1, 1), (3, 1)])
>>> s.crosses(line)
0     True
1     True
2     True
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.crosses(s2, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.crosses(s2, align=False)
0     True
1     True
2    False
3    False
dtype: bool

Notice that a line does not cross a point that it contains.

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

property crs: CRS | None

The Coordinate Reference System (CRS) as a pyproj.CRS object.

Returns None if the CRS is not set, and to set the value it :getter: Returns a pyproj.CRS or None. When setting, the value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

Note: This assumes all records in the GeoSeries are assumed to have the same CRS.

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoSeries
>>> s = GeoSeries([Point(1, 1), Point(2, 2)], crs='EPSG:4326')
>>> s.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

See also

GeoSeries.set_crs

assign CRS

GeoSeries.to_crs

re-project to another CRS

delaunay_triangles(tolerance=0.0, only_edges=False)[source]
difference(other, align=None) GeoSeries[source]

Returns a GeoSeries of the points in each aligned geometry that are not in other.

The operation works on a 1-to-1 row-wise manner:

Unlike Geopandas, Sedona does not support this operation for GeometryCollections.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the difference to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s2.difference(point)
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5           GEOMETRYCOLLECTION EMPTY
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.difference(s2, align=True)
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
5                       POINT (0 1)
dtype: geometry
>>> s.difference(s2, align=False)
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2           GEOMETRYCOLLECTION EMPTY
3             LINESTRING (2 0, 0 2)
4           GEOMETRYCOLLECTION EMPTY
dtype: geometry

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is different from any element of the other one.

disjoint(other, align=None)[source]
distance(other, align=None) pyspark.pandas.Series[source]

Returns a Series containing the distance to aligned other.

The operation works on a 1-to-1 row-wise manner:

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (float)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 0), (1, 1)]),
...         Polygon([(0, 0), (-1, 0), (-1, 1)]),
...         LineString([(1, 1), (0, 0)]),
...         Point(0, 0),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]),
...         Point(3, 1),
...         LineString([(1, 0), (2, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0      POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, -1 0, -1 1, 0 0))
2               LINESTRING (1 1, 0 0)
3                         POINT (0 0)
dtype: geometry
>>> s2
1    POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ...
2                                          POINT (3 1)
3                                LINESTRING (1 0, 2 0)
4                                          POINT (0 1)
dtype: geometry

We can check the distance of each geometry of GeoSeries to a single geometry:

>>> point = Point(-1, 0)
>>> s.distance(point)
0    1.0
1    0.0
2    1.0
3    1.0
dtype: float64

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using align=True or ignore index and use elements based on their matching order using align=False:

>>> s.distance(s2, align=True)
0         NaN
1    0.707107
2    2.000000
3    1.000000
4         NaN
dtype: float64
>>> s.distance(s2, align=False)
0    0.000000
1    3.162278
2    0.707107
3    1.000000
dtype: float64
dwithin(other, distance, align=None)[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is within a set distance from other.

The operation works on a 1-to-1 row-wise manner:

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.

  • distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be applied to all geometries. An array or Series will be applied elementwise. If np.array or pd.Series are used then it must have same length as the GeoSeries.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(0, 4),
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(1, 0), (4, 2), (2, 2)]),
...         Polygon([(2, 0), (3, 2), (2, 2)]),
...         LineString([(2, 0), (2, 2)]),
...         Point(1, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1             LINESTRING (0 0, 0 2)
2             LINESTRING (0 0, 0 1)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((1 0, 4 2, 2 2, 1 0))
2    POLYGON ((2 0, 3 2, 2 2, 2 0))
3             LINESTRING (2 0, 2 2)
4                       POINT (1 1)
dtype: geometry

We can check if each geometry of GeoSeries contains a single geometry:

>>> point = Point(0, 1)
>>> s2.dwithin(point, 1.8)
1     True
2    False
3    False
4     True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.dwithin(s2, distance=1, align=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.dwithin(s2, distance=1, align=False)
0     True
1    False
2    False
3     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within the set distance of any element of the other one.

See also

GeoSeries.within

property envelope: GeoSeries

Returns a GeoSeries of geometries representing the envelope of each geometry.

The envelope of a geometry is the bounding rectangle. That is, the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains the geometry.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point, MultiPoint
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...         MultiPoint([(0, 0), (1, 1)]),
...         Point(0, 0),
...     ]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1        LINESTRING (0 0, 1 1, 1 0)
2         MULTIPOINT ((0 0), (1 1))
3                       POINT (0 0)
dtype: geometry
>>> s.envelope
0    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
2    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
3                            POINT (0 0)
dtype: geometry

See also

GeoSeries.convex_hull

convex hull geometry

estimate_utm_crs(datum_name: str = 'WGS 84') CRS[source]

Returns the estimated UTM CRS based on the bounds of the dataset.

Parameters:

datum_name (str, optional) – The name of the datum to use in the query. Default is WGS 84.

Return type:

pyproj.CRS

Examples

>>> import geodatasets  # requires: pip install geodatasets
>>> import geopandas as gpd
>>> df = gpd.read_file(
...     geodatasets.get_path("geoda.chicago_commpop")
... )
>>> df.geometry.values.estimate_utm_crs()
<Derived Projected CRS: EPSG:32616>
Name: WGS 84 / UTM zone 16N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 90°W and 84°W, northern hemisphere between equator and 84°N,...
- bounds: (-90.0, 0.0, -84.0, 84.0)
Coordinate Operation:
- name: UTM zone 16N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
explode(ignore_index=False, index_parts=False) GeoSeries[source]
property exterior
extract_unique_points()[source]
fillna(value=None, inplace: bool = False, limit=None, **kwargs) GeoSeries | None[source]

Fill NA values with geometry (or geometries).

Parameters:
  • value (shapely geometry or GeoSeries, default None) – If None is passed, NA values will be filled with GEOMETRYCOLLECTION EMPTY. If a shapely geometry object is passed, it will be used to fill all missing values. If a GeoSeries is passed, missing values will be filled based on the corresponding index locations. If pd.NA or np.nan are passed, values will be filled with None (not GEOMETRYCOLLECTION EMPTY).

  • limit (int, default None) – This is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         None,
...         Polygon([(0, 0), (-1, 1), (0, -1)]),
...     ]
... )
>>> s
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1                                None
2    POLYGON ((0 0, -1 1, 0 -1, 0 0))
dtype: geometry

Filled with an empty polygon.

>>> s.fillna()
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1            GEOMETRYCOLLECTION EMPTY
2    POLYGON ((0 0, -1 1, 0 -1, 0 0))
dtype: geometry

Filled with a specific polygon.

>>> s.fillna(Polygon([(0, 1), (2, 1), (1, 2)]))
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1      POLYGON ((0 1, 2 1, 1 2, 0 1))
2    POLYGON ((0 0, -1 1, 0 -1, 0 0))
dtype: geometry

Filled with another GeoSeries.

>>> from shapely.geometry import Point
>>> s_fill = GeoSeries(
...     [
...         Point(0, 0),
...         Point(1, 1),
...         Point(2, 2),
...     ]
... )
>>> s.fillna(s_fill)
0      POLYGON ((0 0, 1 1, 0 1, 0 0))
1                         POINT (1 1)
2    POLYGON ((0 0, -1 1, 0 -1, 0 0))
dtype: geometry

See also

GeoSeries.isna

detect missing values

force_2d()[source]
force_3d(z=0)[source]
classmethod from_arrow(arr, **kwargs) GeoSeries[source]

Construct a GeoSeries from a Arrow array object with a GeoArrow extension type.

See https://geoarrow.org/ for details on the GeoArrow specification.

This functions accepts any Arrow array object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_array__ method).

Note: Requires geopandas versions >= 1.0.0 to use with Sedona.

Parameters:
  • arr (pyarrow.Array, Arrow array) – Any array object implementing the Arrow PyCapsule Protocol (i.e. has an __arrow_c_array__ or __arrow_c_stream__ method). The type of the array should be one of the geoarrow geometry types.

  • **kwargs – Other parameters passed to the GeoSeries constructor.

Return type:

GeoSeries

See also

GeoSeries.to_arrow, GeoDataFrame.from_arrow

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> import geoarrow.pyarrow as ga
>>> array = ga.as_geoarrow([None, "POLYGON ((0 0, 1 1, 0 1, 0 0))", "LINESTRING (0 0, -1 1, 0 -1)"])
>>> geoseries = GeoSeries.from_arrow(array)
>>> geoseries
0                              None
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2      LINESTRING (0 0, -1 1, 0 -1)
dtype: geometry
classmethod from_file(filename: str, format: str | None = None, **kwargs) GeoSeries[source]

Alternate constructor to create a GeoDataFrame from a file.

Parameters:
  • filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in that directory.

  • format (str, optional) – The format of the file to read, by default None. If None, Sedona infers the format from the file extension. Note that format inference is not supported for directories. Available formats are “shapefile”, “geojson”, “geopackage”, and “geoparquet”.

  • table_name (str, optional) – The name of the table to read from a GeoPackage file, by default None. This is required if format is “geopackage”.

See also

GeoDataFrame.to_file

Write a GeoDataFrame to a file.

classmethod from_shapely(data, index=None, crs: Any | None = None, **kwargs) GeoSeries[source]
classmethod from_wkb(data, index=None, crs: Any | None = None, on_invalid='raise', **kwargs) GeoSeries[source]

Alternate constructor to create a GeoSeries from a list or array of WKB objects

Parameters:
  • data (array-like or Series) – Series, list or array of WKB objects

  • index (array-like or Index) – The index for the GeoSeries.

  • crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • on_invalid ({"raise", "warn", "ignore"}, default "raise") –

    • raise: an exception will be raised if a WKB input geometry is invalid.

    • warn: a warning will be raised and invalid WKB geometries will be returned as None.

    • ignore: invalid WKB geometries will be returned as None without a warning.

    • fix: an effort is made to fix invalid input geometries (e.g. close unclosed rings). If this is not possible, they are returned as None without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.

  • kwargs – Additional arguments passed to the Series constructor, e.g. name.

Return type:

GeoSeries

Examples

>>> wkbs = [
... (
...     b"\x01\x01\x00\x00\x00\x00\x00\x00\x00"
...     b"\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?"
... ),
... (
...     b"\x01\x01\x00\x00\x00\x00\x00\x00\x00"
...     b"\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@"
... ),
... (
...    b"\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00"
...    b"\x00\x08@\x00\x00\x00\x00\x00\x00\x08@"
... ),
... ]
>>> s = GeoSeries.from_wkb(wkbs)
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: geometry
classmethod from_wkt(data, index=None, crs: Any | None = None, on_invalid='raise', **kwargs) GeoSeries[source]

Alternate constructor to create a GeoSeries from a list or array of WKT objects

Parameters:
  • data (array-like, Series) – Series, list, or array of WKT objects

  • index (array-like or Index) – The index for the GeoSeries.

  • crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • on_invalid ({"raise", "warn", "ignore"}, default "raise") –

    • raise: an exception will be raised if a WKT input geometry is invalid.

    • warn: a warning will be raised and invalid WKT geometries will be returned as None.

    • ignore: invalid WKT geometries will be returned as None without a warning.

    • fix: an effort is made to fix invalid input geometries (e.g. close unclosed rings). If this is not possible, they are returned as None without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.

  • kwargs – Additional arguments passed to the Series constructor, e.g. name.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> wkts = [
... 'POINT (1 1)',
... 'POINT (2 2)',
... 'POINT (3 3)',
... ]
>>> s = GeoSeries.from_wkt(wkts)
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: geometry
classmethod from_xy(x, y, z=None, index=None, crs=None, **kwargs) GeoSeries[source]

Alternate constructor to create a GeoSeries of Point geometries from lists or arrays of x, y(, z) coordinates

In case of geographic coordinates, it is assumed that longitude is captured by x coordinates and latitude by y.

Parameters:
  • x (iterable)

  • y (iterable)

  • z (iterable)

  • index (array-like or Index, optional) – The index for the GeoSeries. If not given and all coordinate inputs are Series with an equal index, that index is used.

  • crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • **kwargs – Additional arguments passed to the Series constructor, e.g. name.

Return type:

GeoSeries

See also

GeoSeries.from_wkt, points_from_xy

Examples

>>> x = [2.5, 5, -3.0]
>>> y = [0.5, 1, 1.5]
>>> s = GeoSeries.from_xy(x, y, crs="EPSG:4326")
>>> s
0    POINT (2.5 0.5)
1    POINT (5 1)
2    POINT (-3 1.5)
dtype: geometry
property geom_type: pyspark.pandas.Series

Returns a series of strings specifying the geometry type of each geometry of each object.

Note: Unlike Geopandas, Sedona returns LineString instead of LinearRing.

Returns:

A Series containing the geometry type of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon, Point
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Point(0, 0)])
>>> gs.geom_type
0    POLYGON
1    POINT
dtype: object
property geometry: GeoSeries
get_geometry(index) GeoSeries[source]

Returns the n-th geometry from a collection of geometries (0-indexed).

If the index is non-negative, it returns the geometry at that index. If the index is negative, it counts backward from the end of the collection (e.g., -1 returns the last geometry). Returns None if the index is out of bounds.

Note: Simple geometries act as length-1 collections

Note: Using Shapely < 2.0, may lead to different results for empty simple geometries due to how shapely interprets them.

Parameters:

index (int or array_like) – Position of a geometry to be retrieved within its collection

Return type:

GeoSeries

Notes

Simple geometries act as collections of length 1. Any out-of-range index value returns None.

Examples

>>> from shapely.geometry import Point, MultiPoint, GeometryCollection
>>> s = geopandas.GeoSeries(
...     [
...         Point(0, 0),
...         MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]),
...         GeometryCollection(
...             [MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), Point(0, 1)]
...         ),
...         Polygon(),
...         GeometryCollection(),
...     ]
... )
>>> s
0                                          POINT (0 0)
1              MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
2    GEOMETRYCOLLECTION (MULTIPOINT ((0 0), (1 1), ...
3                                        POLYGON EMPTY
4                             GEOMETRYCOLLECTION EMPTY
dtype: geometry
>>> s.get_geometry(0)
0                                POINT (0 0)
1                                POINT (0 0)
2    MULTIPOINT ((0 0), (1 1), (0 1), (1 0))
3                              POLYGON EMPTY
4                                       None
dtype: geometry
>>> s.get_geometry(1)
0           None
1    POINT (1 1)
2    POINT (0 1)
3           None
4           None
dtype: geometry
>>> s.get_geometry(-1)
0    POINT (0 0)
1    POINT (1 0)
2    POINT (0 1)
3  POLYGON EMPTY
4           None
dtype: geometry
get_precision()[source]
property has_sindex

Check the existence of the spatial index without generating it.

Use the .sindex attribute on a GeoDataFrame or GeoSeries to generate a spatial index if it does not yet exist, which may take considerable time based on the underlying index implementation.

Note that the underlying spatial index may not be fully initialized until the first use.

Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.

Examples

>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(x, x) for x in range(5)])
>>> s.has_sindex
False
>>> index = s.sindex
>>> s.has_sindex
True
Returns:

True if the spatial index has been generated or False if not.

Return type:

bool

property has_z: pyspark.pandas.Series

Returns a Series of dtype('bool') with value True for features that have a z-component.

Notes

Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries(
...     [
...         Point(0, 1),
...         Point(0, 1, 2),
...     ]
... )
>>> s
0        POINT (0 1)
1    POINT Z (0 1 2)
dtype: geometry
>>> s.has_z
0    False
1     True
dtype: bool
property interiors
intersection(other: GeoSeries | BaseGeometry, align: bool | None = None) GeoSeries[source]

Returns a GeoSeries of the intersection of points in each aligned geometry with other.

The operation works on a 1-to-1 row-wise manner.

Note: Unlike most functions, intersection may return the unordered with respect to the index. If this is important to you, you may call sort_index() on the result.

Parameters:
  • other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the intersection with.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 6),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

We can also do intersection of each geometry and a single shapely geometry:

>>> s.intersection(Polygon([(0, 0), (1, 1), (0, 1)]))
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2             LINESTRING (0 0, 1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.intersection(s2, align=True)
0                              None
1    POLYGON ((0 0, 0 1, 1 1, 0 0))
2                       POINT (1 1)
3             LINESTRING (2 0, 0 2)
4                       POINT EMPTY
5                              None
dtype: geometry
>>> s.intersection(s2, align=False)
0    POLYGON ((0 0, 0 1, 1 1, 0 0))
1             LINESTRING (1 1, 1 2)
2                       POINT (1 1)
3                       POINT (1 1)
4                       POINT (0 1)
dtype: geometry

See also

GeoSeries.difference, GeoSeries.symmetric_difference, GeoSeries.union

intersection_all()[source]
intersects(other: GeoSeries | BaseGeometry, align: bool | None = None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that intersects other.

An object is said to intersect other if its boundary and interior intersects in any way with those of the other.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is intersected.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         LineString([(1, 0), (1, 3)]),
...         LineString([(2, 0), (0, 2)]),
...         Point(1, 1),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1             LINESTRING (0 0, 2 2)
2             LINESTRING (2 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    LINESTRING (1 0, 1 3)
2    LINESTRING (2 0, 0 2)
3              POINT (1 1)
4              POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries crosses a single geometry:

>>> line = LineString([(-1, 1), (3, 1)])
>>> s.intersects(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.intersects(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.intersects(s2, align=False)
0    True
1    True
2    True
3    True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries crosses any element of the other one.

property is_ccw
property is_closed
property is_empty: pyspark.pandas.Series

Returns a Series of dtype('bool') with value True for empty geometries.

Examples

An example of a GeoDataFrame with one empty point, one point and one missing value:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> geoseries = GeoSeries([Point(), Point(2, 1), None], crs="EPSG:4326")
>>> geoseries
0  POINT EMPTY
1  POINT (2 1)
2         None
>>> geoseries.is_empty
0     True
1    False
2    False
dtype: bool

See also

GeoSeries.isna

detect missing geometries

property is_ring

Return a Series of dtype('bool') with value True for features that are closed.

When constructing a LinearRing, the sequence of coordinates may be explicitly closed by passing identical values in the first and last indices. Otherwise, the sequence will be implicitly closed by copying the first tuple to the last index.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import LineString, LinearRing
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1)]),
...         LineString([(0, 0), (1, 1), (1, -1), (0, 0)]),
...         LinearRing([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0         LINESTRING (0 0, 1 1, 1 -1)
1    LINESTRING (0 0, 1 1, 1 -1, 0 0)
2    LINEARRING (0 0, 1 1, 1 -1, 0 0)
dtype: geometry
>>> s.is_ring
0    False
1     True
2     True
dtype: bool
property is_simple: pyspark.pandas.Series

Returns a Series of dtype('bool') with value True for geometries that do not cross themselves.

This is meaningful only for LineStrings and LinearRings.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import LineString
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (1, 1), (1, -1), (0, 1)]),
...         LineString([(0, 0), (1, 1), (1, -1)]),
...     ]
... )
>>> s
0    LINESTRING (0 0, 1 1, 1 -1, 0 1)
1         LINESTRING (0 0, 1 1, 1 -1)
dtype: geometry
>>> s.is_simple
0    False
1     True
dtype: bool
property is_valid: pyspark.pandas.Series

Returns a Series of dtype('bool') with value True for geometries that are valid.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid
0     True
1    False
2     True
3    False
dtype: bool

See also

GeoSeries.is_valid_reason

reason for invalidity

is_valid_reason() pyspark.pandas.Series[source]

Returns a Series of strings with the reason for invalidity of each geometry.

Examples

An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         Polygon([(0,0), (1, 1), (1, 0), (0, 1)]),  # bowtie geometry
...         Polygon([(0, 0), (2, 2), (2, 0)]),
...         Polygon([(0, 0), (2, 0), (1, 1), (2, 2), (0, 2), (1, 1), (0, 0)]),
...         None
...     ]
... )
>>> s
0         POLYGON ((0 0, 1 1, 0 1, 0 0))
1    POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0))
2         POLYGON ((0 0, 2 2, 2 0, 0 0))
3                                   None
dtype: geometry
>>> s.is_valid_reason()
0    Valid Geometry
1    Self-intersection at or near point (0.5, 0.5, NaN)
2    Valid Geometry
3    Ring Self-intersection at or near point (1.0, 1.0)
4    None
dtype: object

See also

GeoSeries.is_valid

detect invalid geometries

GeoSeries.make_valid

fix invalid geometries

isna() pyspark.pandas.Series[source]

Detect missing values.

Returns:

  • A boolean Series of the same size as the GeoSeries,

  • True where a value is NA.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [Polygon([(0, 0), (1, 1), (0, 1)]), None, Polygon([])]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1                              None
2                     POLYGON EMPTY
dtype: geometry
>>> s.isna()
0    False
1     True
2    False
dtype: bool

See also

GeoSeries.notna

inverse of isna

GeoSeries.is_empty

detect empty geometries

isnull() pyspark.pandas.Series[source]

Alias for isna method. See isna for more detail.

property length: pyspark.pandas.Series

Returns a Series containing the length of each geometry in the GeoSeries.

In the case of a (Multi)Polygon it measures the length of its exterior (i.e. perimeter).

For a GeometryCollection it measures sums the values for each of the individual geometries.

Returns:

A Series containing the length of each geometry.

Return type:

Series

Examples

>>> from shapely.geometry import Polygon
>>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)]), GeometryCollection([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)])])])
>>> gs.length
0    0.000000
1    1.414214
2    3.414214
3    4.828427
dtype: float64
line_merge(directed=False)[source]
property m: pyspark.pandas.Series
make_valid(*, method='linework', keep_collapsed=True) GeoSeries[source]

Repairs invalid geometries.

Returns a GeoSeries with valid geometries.

If the input geometry is already valid, then it will be preserved. In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same type to be made valid, then a multi-part geometry will be returned (e.g. a MultiPolygon). If the geometry must be split into multiple parts of different types to be made valid, then a GeometryCollection will be returned.

In Sedona, only the ‘structure’ method is available:

  • the ‘structure’ algorithm tries to reason from the structure of the input to find the ‘correct’ repair: exterior rings bound area, interior holes exclude area. It first makes all rings valid, then shells are merged and holes are subtracted from the shells to generate valid result. It assumes that holes and shells are correctly categorized in the input geometry.

Parameters:
  • method ({'linework', 'structure'}, default 'linework') – Algorithm to use when repairing geometry. Sedona Geopandas only supports the ‘structure’ method. The default method is “linework” to match compatibility with Geopandas, but it must be explicitly set to ‘structure’ to use the Sedona implementation.

  • keep_collapsed (bool, default True) – For the ‘structure’ method, True will keep components that have collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import MultiPolygon, Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]),
...         Polygon([(0, 2), (0, 1), (2, 0), (0, 0), (0, 2)]),
...         LineString([(0, 0), (1, 1), (1, 0)]),
...     ],
... )
>>> s
0    POLYGON ((0 0, 0 2, 1 1, 2 2, 2 0, 1 1, 0 0))
1              POLYGON ((0 2, 0 1, 2 0, 0 0, 0 2))
2                       LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
>>> s.make_valid()
0    MULTIPOLYGON (((1 1, 0 0, 0 2, 1 1)), ((2 0, 1...
1                       POLYGON ((0 1, 2 0, 0 0, 0 1))
2                           LINESTRING (0 0, 1 1, 1 0)
dtype: geometry
minimum_bounding_circle()[source]
minimum_bounding_radius()[source]
minimum_clearance()[source]
minimum_rotated_rectangle()[source]
normalize()[source]
notna() pyspark.pandas.Series[source]

Detect non-missing values.

Returns:

  • A boolean pandas Series of the same size as the GeoSeries,

  • False where a value is NA.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon
>>> s = GeoSeries(
...     [Polygon([(0, 0), (1, 1), (0, 1)]), None, Polygon([])]
... )
>>> s
0    POLYGON ((0 0, 1 1, 0 1, 0 0))
1                              None
2                     POLYGON EMPTY
dtype: geometry
>>> s.notna()
0     True
1    False
2     True
dtype: bool

See also

GeoSeries.isna

inverse of notna

GeoSeries.is_empty

detect empty geometries

notnull() pyspark.pandas.Series[source]

Alias for notna method. See notna for more detail.

offset_curve(distance, quad_segs=8, join_style='round', mitre_limit=5.0)[source]
overlaps(other, align=None) pyspark.pandas.Series[source]

Returns True for all aligned geometries that overlap other, else False.

In the original Geopandas, Geometries overlap if they have more than one but not all points in common, have the same dimension, and the intersection of the interiors of the geometries has the same dimension as the geometries themselves.

However, in Sedona, we return True in the case where the geometries points match.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if overlaps.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 0), (0, 2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 3)]),
...         Point(0, 1),
...     ],
... )

We can check if each geometry of GeoSeries overlaps a single geometry:

>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> s.overlaps(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We align both GeoSeries based on index values and compare elements with the same index.

>>> s.overlaps(s2)
0    False
1     True
2    False
3    False
4    False
dtype: bool
>>> s.overlaps(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries overlaps any element of the other one.

plot(*args, **kwargs)[source]

Plot a GeoSeries.

Generate a plot of a GeoSeries geometry with matplotlib.

Note: This method is not scalable and requires collecting all data to the driver.

Parameters:
  • s (Series) – The GeoSeries to be plotted. Currently Polygon, MultiPolygon, LineString, MultiLineString, Point and MultiPoint geometries can be plotted.

  • cmap (str (default None)) –

    The name of a colormap recognized by matplotlib. Any colormap will work, but categorical colormaps are generally recommended. Examples of useful discrete colormaps include:

    tab10, tab20, Accent, Dark2, Paired, Pastel1, Set1, Set2

  • color (str, np.array, pd.Series, List (default None)) – If specified, all objects will be colored uniformly.

  • ax (matplotlib.pyplot.Artist (default None)) – axes on which to draw the plot

  • figsize (pair of floats (default None)) – Size of the resulting matplotlib.figure.Figure. If the argument ax is given explicitly, figsize is ignored.

  • aspect ('auto', 'equal', None or float (default 'auto')) – Set aspect of axis. If ‘auto’, the default aspect for map plots is ‘equal’; if however data are not projected (coordinates are long/lat), the aspect is by default set to 1/cos(s_y * pi/180) with s_y the y coordinate of the middle of the GeoSeries (the mean of the y range of bounding box) so that a long/lat square appears square in the middle of the plot. This implies an Equirectangular projection. If None, the aspect of ax won’t be changed. It can also be set manually (float) as the ratio of y-unit to x-unit.

  • autolim (bool (default True)) – Update axes data limits to contain the new geometries.

  • **style_kwds (dict) – Color options to be passed on to the actual plot function, such as edgecolor, facecolor, linewidth, markersize, alpha.

Returns:

ax

Return type:

matplotlib axes instance

remove_repeated_points(tolerance=0.0)[source]
representative_point()[source]
reverse()[source]
segmentize(max_segment_length)[source]

Returns a GeoSeries with vertices added to line segments based on maximum segment length.

Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment. Only linear components of input geometries are densified; other geometries are returned unmodified.

Parameters:

max_segment_length (float | array-like) – Additional vertices will be added so that all line segments are no longer than this value. Must be greater than 0.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Polygon, LineString
>>> s = GeoSeries(
...     [
...         LineString([(0, 0), (0, 10)]),
...         Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]),
...     ],
... )
>>> s
0                     LINESTRING (0 0, 0 10)
1    POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))
dtype: geometry
>>> s.segmentize(max_segment_length=5)
0                          LINESTRING (0 0, 0 5, 0 10)
1    POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0...
dtype: geometry
set_crs(crs: Any | None = None, epsg: int | None = None, inplace: Literal[True] = True, allow_override: bool = False) None[source]
set_crs(crs: Any | None = None, epsg: int | None = None, inplace: Literal[False] = False, allow_override: bool = False) GeoSeries

Set the Coordinate Reference System (CRS) of a GeoSeries.

Pass None to remove CRS from the GeoSeries.

Notes

The underlying geometries are not transformed to this CRS. To transform the geometries to a new CRS, use the to_crs method.

Parameters:
  • crs (pyproj.CRS | None, optional) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional if crs is specified) – EPSG code specifying the projection.

  • inplace (bool, default False) – If True, the CRS of the GeoSeries will be changed in place (while still returning the result) instead of making a copy of the GeoSeries.

  • allow_override (bool, default True) – If the GeoSeries already has a CRS, allow to replace the existing CRS, even when both are not equal. In Sedona, setting this to True will lead to eager evaluation instead of lazy evaluation. Unlike Geopandas, True is the default value in Sedona for performance reasons.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: geometry

Setting CRS to a GeoSeries without one:

>>> s.crs is None
True
>>> s = s.set_crs('epsg:3857')
>>> s.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Overriding existing CRS:

>>> s = s.set_crs(4326, allow_override=True)

Without allow_override=True, set_crs returns an error if you try to override CRS.

See also

GeoSeries.to_crs

re-project to another CRS

set_precision(grid_size, mode='valid_output')[source]
simplify(tolerance=None, preserve_topology=True) GeoSeries[source]

Returns a GeoSeries containing a simplified representation of each geometry.

The algorithm (Douglas-Peucker) recursively splits the original line into smaller parts and connects these parts’ endpoints by a straight line. Then, it removes all points whose distance to the straight line is smaller than tolerance. It does not move any points and it always preserves endpoints of the original line or polygon. See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify for details

Simplifies individual geometries independently, without considering the topology of a potential polygonal coverage. If you would like to treat the GeoSeries as a coverage and simplify its edges, while preserving the coverage topology, see simplify_coverage().

Parameters:
  • tolerance (float) – All parts of a simplified geometry will be no more than tolerance distance from the original. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.

  • preserve_topology (bool (default True)) – False uses a quicker algorithm, but may produce self-intersecting or otherwise invalid geometries.

Notes

Invalid geometric objects may result from simplification that does not preserve topology and simplification may be sensitive to the order of coordinates: two geometries differing only in order of coordinates may be simplified differently.

See also

simplify_coverage

simplify geometries using coverage simplification

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point, LineString
>>> s = GeoSeries(
...     [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])]
... )
>>> s
0    POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19...
1                         LINESTRING (0 0, 1 10, 0 20)
dtype: geometry
>>> s.simplify(1)
0    POLYGON ((0 1, 0 -1, -1 0, 0 1))
1              LINESTRING (0 0, 0 20)
dtype: geometry
property sindex: SpatialIndex

Returns a spatial index for the GeoSeries.

Note that the spatial index may not be fully initialized until the first use.

Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.

Returns:

The spatial index.

Return type:

SpatialIndex

Examples

>>> from shapely.geometry import Point, box
>>> from sedona.spark.geopandas import GeoSeries
>>>
>>> s = GeoSeries([Point(x, x) for x in range(5)])
>>> s.sindex.query(box(1, 1, 3, 3))
[Point(1, 1), Point(2, 2), Point(3, 3)]
>>> s.has_sindex
True
snap(other, tolerance, align=None) GeoSeries[source]

Snap the vertices and segments of the geometry to vertices of the reference.

Vertices and segments of the input geometry are snapped to vertices of the reference geometry, returning a new geometry; the input geometries are not modified. The result geometry is the input geometry with the vertices and segments snapped. If no snapping occurs then the input geometry is returned unchanged. The tolerance is used to control where snapping is performed.

Where possible, this operation tries to avoid creating invalid geometries; however, it does not guarantee that output geometries will be valid. It is the responsibility of the caller to check for and handle invalid geometries.

Because too much snapping can result in invalid geometries being created, heuristics are used to determine the number and location of snapped vertices that are likely safe to snap. These heuristics may omit some potential snaps that are otherwise within the tolerance.

Note: Sedona’s result may differ slightly from geopandas’s snap() result because of small differences between the underlying engines being used.

The operation works in a 1-to-1 row-wise manner:

Parameters:
  • other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to snap to.

  • tolerance (float or array like) – Maximum distance between vertices that shall be snapped

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

GeoSeries

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Point(0.5, 2.5),
...         LineString([(0.1, 0.1), (0.49, 0.51), (1.01, 0.89)]),
...         Polygon([(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)]),
...     ],
... )
>>> s
0                               POINT (0.5 2.5)
1    LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2       POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))
dtype: geometry
>>> s2 = GeoSeries(
...     [
...         Point(0, 2),
...         LineString([(0, 0), (0.5, 0.5), (1.0, 1.0)]),
...         Point(8, 10),
...     ],
...     index=range(1, 4),
... )
>>> s2
1                       POINT (0 2)
2    LINESTRING (0 0, 0.5 0.5, 1 1)
3                      POINT (8 10)
dtype: geometry

We can snap each geometry to a single shapely geometry:

>>> s.snap(Point(0, 2), tolerance=1)
0                                     POINT (0 2)
1      LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0 0, 0 2, 0 10, 10 10, 10 0, 0 0))
dtype: geometry

We can also snap two GeoSeries to each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and snap elements with the same index using align=True or ignore index and snap elements based on their matching order using align=False:

>>> s.snap(s2, tolerance=1, align=True)
0                                                 None
1           LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89)
2    POLYGON ((0.5 0.5, 1 1, 0 10, 10 10, 10 0, 0.5...
3                                                 None
dtype: geometry
>>> s.snap(s2, tolerance=1, align=False)
0                                      POINT (0 2)
1                   LINESTRING (0 0, 0.5 0.5, 1 1)
2    POLYGON ((0 0, 0 10, 8 10, 10 10, 10 0, 0 0))
dtype: geometry
to_arrow(geometry_encoding='WKB', interleaved=True, include_z=None)[source]

Encode a GeoSeries to GeoArrow format.

See https://geoarrow.org/ for details on the GeoArrow specification.

This functions returns a generic Arrow array object implementing the Arrow PyCapsule Protocol (i.e. having an __arrow_c_array__ method). This object can then be consumed by your Arrow implementation of choice that supports this protocol.

Note: Requires geopandas versions >= 1.0.0 to use with Sedona.

Parameters:
  • geometry_encoding ({'WKB', 'geoarrow' }, default 'WKB') – The GeoArrow encoding to use for the data conversion.

  • interleaved (bool, default True) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’ coordinates are interleaved in a single fixed size list array. If False, the coordinates are stored as separate arrays in a struct type.

  • include_z (bool, default None) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality of the individual geometries is preserved). If False, return 2D geometries. If True, include the third dimension in the output (if a geometry has no third dimension, the z-coordinates will be NaN). By default, will infer the dimensionality from the input geometries. Note that this inference can be unreliable with empty geometries (for a guaranteed result, it is recommended to specify the keyword).

Returns:

A generic Arrow array object with geometry data encoded to GeoArrow.

Return type:

GeoArrowArray

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> gser = GeoSeries([Point(1, 2), Point(2, 1)])
>>> gser
0    POINT (1 2)
1    POINT (2 1)
dtype: geometry
>>> arrow_array = gser.to_arrow()
>>> arrow_array
<geopandas.io._geoarrow.GeoArrowArray object at ...>

The returned array object needs to be consumed by a library implementing the Arrow PyCapsule Protocol. For example, wrapping the data as a pyarrow.Array (requires pyarrow >= 14.0):

>>> import pyarrow as pa
>>> array = pa.array(arrow_array)
>>> array
<pyarrow.lib.BinaryArray object at ...>
[
  0101000000000000000000F03F0000000000000040,
  01010000000000000000000040000000000000F03F
]
to_crs(crs: Any | None = None, epsg: int | None = None) GeoSeries[source]

Returns a GeoSeries with all geometries transformed to a new coordinate reference system.

Transform all geometries in a GeoSeries to a different coordinate reference system. The crs attribute on the current GeoSeries must be set. Either crs or epsg may be specified for output.

This method will transform all points in all objects. It has no notion of projecting entire geometries. All segments joining points are assumed to be lines in the current projection, not geodesics. Objects crossing the dateline (or other projection boundary) will have undesirable behavior.

Parameters:
  • crs (pyproj.CRS, optional if epsg is specified) – The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • epsg (int, optional if crs is specified) – EPSG code specifying output projection.

Return type:

GeoSeries

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoSeries
>>> geoseries = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)], crs=4326)
>>> geoseries.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
>>> geoseries = geoseries.to_crs(3857)
>>> print(geoseries)
0    POINT (111319.491 111325.143)
1    POINT (222638.982 222684.209)
2    POINT (333958.472 334111.171)
dtype: geometry
>>> geoseries.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
to_file(path: str, driver: str | None = None, schema: dict | None = None, index: bool | None = None, **kwargs)[source]

Write the GeoSeries to a file.

Parameters:
  • path (str) – File path or file handle to write to.

  • driver (str, optional) – The format driver used to write the file, by default None. If not specified, it’s inferred from the file extension. Available formats are “geojson”, “geopackage”, and “geoparquet”.

  • index (bool, optional) – If True, writes the index as a column. If False, no index is written. By default None, the index is written only if it is named, is a MultiIndex, or has a non-integer data type.

  • mode (str, default 'w') – The write mode: ‘w’ to overwrite the existing file or ‘a’ to append.

  • crs (pyproj.CRS, optional) – The coordinate reference system to write. If None, it is determined from the GeoSeries crs attribute. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (e.g., “EPSG:4326”) or a WKT string.

  • **kwargs – Additional keyword arguments passed to the underlying writing engine.

Examples

>>> from shapely.geometry import Point, LineString
>>> from sedona.spark.geopandas import GeoSeries
>>> # Note: Examples write to temporary files for demonstration
>>> import tempfile
>>> import os

Create a GeoSeries: >>> gs = GeoSeries( … [Point(0, 0), LineString([(1, 1), (2, 2)])], … index=[“a”, “b”] … )

Save to a GeoParquet file: >>> path_parquet = os.path.join(tempfile.gettempdir(), “data.parquet”) >>> gs.to_file(path_parquet, driver=”geoparquet”)

Append to a GeoJSON file: >>> path_json = os.path.join(tempfile.gettempdir(), “data.json”) >>> gs.to_file(path_json, driver=”geojson”, mode=’a’)

to_geoframe(name=None)[source]
to_geopandas() GeoSeries[source]

Convert the GeoSeries to a geopandas GeoSeries.

Returns: - geopandas.GeoSeries: A geopandas GeoSeries.

to_json(show_bbox: bool = True, drop_id: bool = False, to_wgs84: bool = False, **kwargs) str[source]

Returns a GeoJSON string representation of the GeoSeries.

Parameters:
  • show_bbox (bool, optional, default: True) – Include bbox (bounds) in the geojson

  • drop_id (bool, default: False) – Whether to retain the index of the GeoSeries as the id property in the generated GeoJSON. Default is False, but may want True if the index is just arbitrary row numbers.

  • to_wgs84 (bool, optional, default: False) –

    If the CRS is set on the active geometry column it is exported as WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification. Set to True to force re-projection and set to False to ignore CRS. False by default.

  • json.dumps(). (*kwargs* that will be passed to)

  • Note (Unlike geopandas, Sedona's implementation will replace 'LinearRing')

  • output. (with 'LineString' in the GeoJSON)

Return type:

JSON string

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: geometry
>>> s.to_json()
'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [1.0, 1.0]}, "bbox": [1.0, 1.0, 1.0, 1.0]}, {"id": "1", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [2.0, 2.0]}, "bbox": [2.0, 2.0, 2.0, 2.0]}, {"id": "2", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [3.0, 3.0]}, "bbox": [3.0, 3.0, 3.0, 3.0]}], "bbox": [1.0, 1.0, 3.0, 3.0]}'

See also

GeoSeries.to_file

write GeoSeries to file

to_parquet(path, **kwargs)[source]

Write the GeoSeries to a GeoParquet file.

Parameters:
  • path (str) – The file path where the GeoParquet file will be written.

  • **kwargs – Additional keyword arguments passed to the underlying writing function.

Return type:

None

Examples

>>> from shapely.geometry import Point
>>> from sedona.spark.geopandas import GeoSeries
>>> import tempfile
>>> import os
>>> gs = GeoSeries([Point(1, 1), Point(2, 2)])
>>> file_path = os.path.join(tempfile.gettempdir(), "my_geodata.parquet")
>>> gs.to_parquet(file_path)
to_spark_pandas() pyspark.pandas.Series[source]
to_wkb(hex: bool = False, **kwargs) pyspark.pandas.Series[source]

Convert GeoSeries geometries to WKB

Parameters:
  • hex (bool) – If true, export the WKB as a hexadecimal string. The default is to return a binary bytes object.

  • kwargs – Additional keyword args will be passed to shapely.to_wkb().

Returns:

WKB representations of the geometries

Return type:

Series

See also

GeoSeries.to_wkt

Examples

>>> from shapely.geometry import Point, Polygon
>>> s = GeoSeries(
...     [
...         Point(0, 0),
...         Polygon(),
...         Polygon([(0, 0), (1, 1), (1, 0)]),
...         None,
...     ]
... )
>>> s.to_wkb()
0    b'...
1              b''
2    b'...
3                                                 None
dtype: object
>>> s.to_wkb(hex=True)
0           010100000000000000000000000000000000000000
1                                   010300000000000000
2    0103000000010000000400000000000000000000000000...
3                                                 None
dtype: object
to_wkt(**kwargs) pyspark.pandas.Series[source]

Convert GeoSeries geometries to WKT

Note: Using shapely < 1.0.0 may return different geometries for empty geometries.

Parameters:

kwargs – Keyword args will be passed to shapely.to_wkt().

Returns:

WKT representations of the geometries

Return type:

Series

Examples

>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: geometry
>>> s.to_wkt()
0    POINT (1 1)
1    POINT (2 2)
2    POINT (3 3)
dtype: object

See also

GeoSeries.to_wkb

property total_bounds

Returns a tuple containing minx, miny, maxx, maxy values for the bounds of the series as a whole.

See GeoSeries.bounds for the bounds of the geometries contained in the series.

Examples

>>> from shapely.geometry import Point, Polygon, LineString
>>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]),
... LineString([(0, 1), (1, 2)])]}
>>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
>>> gdf.total_bounds
array([ 0., -1.,  3.,  2.])
touches(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that touches other.

An object is said to touch other if it has at least one point in common with other and its interior does not intersect with any part of the other. Overlapping features therefore do not touch.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if is touched.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         LineString([(0, 0), (2, 2)]),
...         MultiPoint([(0, 0), (0, 1)]),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (-2, 0), (0, -2)]),
...         LineString([(0, 1), (1, 1)]),
...         LineString([(1, 1), (3, 0)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3         MULTIPOINT ((0 0), (0 1))
dtype: geometry
>>> s2
1    POLYGON ((0 0, -2 0, 0 -2, 0 0))
2               LINESTRING (0 1, 1 1)
3               LINESTRING (1 1, 3 0)
4                         POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries touches a single geometry:

>>> line = LineString([(0, 0), (-1, -2)])
>>> s.touches(line)
0    True
1    True
2    True
3    True
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s.touches(s2, align=True)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> s.touches(s2, align=False)
0     True
1    False
2     True
3    False
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries touches any element of the other one.

transform(transformation, include_z=False)[source]
property type
property unary_union
union_all(method='unary', grid_size=None) BaseGeometry[source]

Returns a geometry containing the union of all geometries in the GeoSeries.

Sedona does not support the method or grid_size argument, so the user does not need to manually decide the algorithm being used.

Parameters:
  • method (str (default "unary")) – Not supported in Sedona.

  • grid_size (float, default None) – Not supported in Sedona.

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import box
>>> s = GeoSeries([box(0, 0, 1, 1), box(0, 0, 2, 2)])
>>> s
0    POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1    POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0))
dtype: geometry
>>> s.union_all()
<POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))>
voronoi_polygons(tolerance=0.0, extend_to=None, only_edges=False)[source]
within(other, align=None) pyspark.pandas.Series[source]

Returns a Series of dtype('bool') with value True for each aligned geometry that is within other.

An object is said to be within other if at least one of its points is located in the interior and no points are located in the exterior of the other. If either object is empty, this operation returns False.

This is the inverse of contains in the sense that the expression a.within(b) == b.contains(a) always evaluates to True.

Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections and for geometries that are equal.

The operation works on a 1-to-1 row-wise manner.

Parameters:
  • other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test if each geometry is within.

  • align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.

Return type:

Series (bool)

Examples

>>> from shapely.geometry import Polygon, LineString, Point
>>> s = GeoSeries(
...     [
...         Polygon([(0, 0), (2, 2), (0, 2)]),
...         Polygon([(0, 0), (1, 2), (0, 2)]),
...         LineString([(0, 0), (0, 2)]),
...         Point(0, 1),
...     ],
... )
>>> s2 = GeoSeries(
...     [
...         Polygon([(0, 0), (1, 1), (0, 1)]),
...         LineString([(0, 0), (0, 2)]),
...         LineString([(0, 0), (0, 1)]),
...         Point(0, 1),
...     ],
...     index=range(1, 5),
... )
>>> s
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 1 2, 0 2, 0 0))
2             LINESTRING (0 0, 0 2)
3                       POINT (0 1)
dtype: geometry
>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (0 0, 0 2)
3             LINESTRING (0 0, 0 1)]
4                       POINT (0 1)
dtype: geometry

We can check if each geometry of GeoSeries is within a single geometry:

>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)])
>>> s.within(polygon)
0     True
1     True
2    False
3    False
dtype: bool

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

>>> s2.within(s)
0    False
1    False
2     True
3    False
4    False
dtype: bool
>>> s2.within(s, align=False)
1     True
2    False
3     True
4     True
dtype: bool

Notes

This method works in a row-wise manner. It does not check if an element of one GeoSeries is within any element of the other one.

property x: pyspark.pandas.Series

Return the x location of point geometries in a GeoSeries

Return type:

pandas.Series

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s.x
0    1.0
1    2.0
2    3.0
dtype: float64
property y: pyspark.pandas.Series

Return the y location of point geometries in a GeoSeries

Return type:

pandas.Series

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)])
>>> s.y
0    1.0
1    2.0
2    3.0
dtype: float64
property z: pyspark.pandas.Series

Return the z location of point geometries in a GeoSeries

Return type:

pandas.Series

Examples

>>> from sedona.spark.geopandas import GeoSeries
>>> from shapely.geometry import Point
>>> s = GeoSeries([Point(1, 1, 1), Point(2, 2, 2), Point(3, 3, 3)])
>>> s.z
0    1.0
1    2.0
2    3.0
dtype: float64

sedona.spark.geopandas.io module

sedona.spark.geopandas.io.read_file(filename: str, format: str | None = None, **kwargs)[source]

Alternate constructor to create a GeoDataFrame from a file.

Parameters:
  • filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in the directory into a dataframe.

  • format (str, default None) –

    The format of the file to read. If None, Sedona will infer the format from the file extension. Note, inferring the format from the file extension is not supported for directories. Options:

    • ”shapefile”

    • ”geojson”

    • ”geopackage”

    • ”geoparquet”

See also

GeoDataFrame.to_file

write GeoDataFrame to file

sedona.spark.geopandas.io.read_parquet(path, columns=None, storage_options=None, bbox=None, to_pandas_kwargs=None, **kwargs)[source]

Load a Parquet object from the file path, returning a GeoDataFrame.

  • if no geometry columns are read, this will raise a ValueError - you should use the pandas read_parquet method instead.

If ‘crs’ key is not present in the GeoParquet metadata associated with the Parquet object, it will default to “OGC:CRS84” according to the specification.

Parameters:
  • path (str, path object)

  • columns (list-like of strings, default=None) – Not currently supported in Sedona

  • storage_options (dict, optional) – Not currently supported in Sedona

  • bbox (tuple, optional) – Not currently supported in Sedona

  • to_pandas_kwargs (dict, optional) – Not currently supported in Sedona

Return type:

GeoDataFrame

Examples

from sedona.spark.geopandas import read_parquet >>> df = read_parquet(“data.parquet”) # doctest: +SKIP

Specifying columns to read:

>>> df = read_parquet(
...     "data.parquet",
... )

sedona.spark.geopandas.sindex module

class sedona.spark.geopandas.sindex.SpatialIndex(geometry, index_type='strtree', column_name=None)[source]

Bases: object

A wrapper around Sedona’s spatial index functionality.

__init__(geometry, index_type='strtree', column_name=None)[source]

Initialize the SpatialIndex with geometry data.

Parameters:
  • geometry (np.array of Shapely geometries, PySparkDataFrame column, or PySparkDataFrame)

  • index_type (str, default "strtree") – The type of spatial index to use.

  • column_name (str, optional) – The column name to extract geometry from if geometry is a PySparkDataFrame.

intersection(bounds)[source]

Find geometries that intersect the given bounding box.

Parameters:

bounds (tuple) – Bounding box as (min_x, min_y, max_x, max_y).

Returns:

List of indices of matching geometries.

Return type:

list

property is_empty

Check if the spatial index is empty.

Returns:

True if the index is empty, False otherwise.

Return type:

bool

nearest(geometry, k=1, return_distance=False)[source]

Find the nearest geometry in the spatial index.

Parameters:
  • geometry (Shapely geometry) – The geometry to find the nearest neighbor for.

  • k (int, optional, default 1) – Number of nearest neighbors to find.

  • return_distance (bool, optional, default False) – Whether to return distances along with indices.

Returns:

List of indices of nearest geometries, optionally with distances.

Return type:

list or tuple

query(geometry, predicate=None, sort=False)[source]

Query the spatial index for geometries that intersect the given geometry.

Parameters:
  • geometry (Shapely geometry) – The geometry to query against the spatial index.

  • predicate (str, optional) – Spatial predicate to filter results. Must be either ‘intersects’ (default) or ‘contains’.

  • sort (bool, optional, default False) – Whether to sort the results.

Returns:

List of indices of matching geometries.

Return type:

list

property size

Get the size of the spatial index.

Returns:

Number of geometries in the index.

Return type:

int

Module contents

Added in version 1.8.0: geopandas API on Sedona