sedona.spark.geopandas package
Subpackages
Submodules
sedona.spark.geopandas.base module
A base class of Sedona/Spark DataFrame/Column to behave like geopandas GeoDataFrame/GeoSeries.
- class sedona.spark.geopandas.base.GeoFrame[source]
Bases:
object
A base class for both GeoDataFrame and GeoSeries.
- property area: pyspark.pandas.Series
Returns a Series containing the area of each geometry in the GeoSeries expressed in the units of the CRS.
- Returns:
A Series containing the area of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])]) >>> gs.area 0 1.0 1 4.0 dtype: float64
- property boundary
Returns a
GeoSeries
of lower dimensional objects representing each geometry’s set-theoretic boundary.Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 POINT (0 0) dtype: geometry
>>> s.boundary 0 LINESTRING (0 0, 1 1, 0 1, 0 0) 1 MULTIPOINT ((0 0), (1 0)) 2 GEOMETRYCOLLECTION EMPTY dtype: geometry
See also
GeoSeries.exterior
outer boundary (without interior rings)
- property bounds: pyspark.pandas.DataFrame
Returns a
DataFrame
with columnsminx
,miny
,maxx
,maxy
values containing the bounds for each geometry.See
GeoSeries.total_bounds
for the limits of the entire series.Examples
>>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.bounds minx miny maxx maxy 0 2.0 1.0 2.0 1.0 1 0.0 0.0 1.0 1.0 2 0.0 1.0 1.0 2.0
You can assign the bounds to the
GeoDataFrame
as:>>> import pandas as pd >>> gdf = pd.concat([gdf, gdf.bounds], axis=1) >>> gdf geometry minx miny maxx maxy 0 POINT (2 1) 2.0 1.0 2.0 1.0 1 POLYGON ((0 0, 1 1, 1 0, 0 0)) 0.0 0.0 1.0 1.0 2 LINESTRING (0 1, 1 2) 0.0 1.0 1.0 2.0
- buffer(distance, resolution=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs)[source]
Returns a GeoSeries with all geometries buffered by the specified distance.
- Parameters:
distance (float) – The distance to buffer by. Negative distances will create inward buffers.
resolution (int, default 16) – The resolution of the buffer around each vertex. Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.
cap_style (str, default "round") – The style of the buffer cap. One of ‘round’, ‘flat’, ‘square’.
join_style (str, default "round") – The style of the buffer join. One of ‘round’, ‘mitre’, ‘bevel’.
mitre_limit (float, default 5.0) – The mitre limit ratio for joins when join_style=’mitre’.
single_sided (bool, default False) – Whether to create a single-sided buffer. In Sedona, True will default to left-sided buffer. However, ‘right’ may be specified to use a right-sided buffer.
- Returns:
A new GeoSeries with buffered geometries.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoDataFrame >>> >>> data = { ... 'geometry': [Point(0, 0), Point(1, 1)], ... 'value': [1, 2] ... } >>> gdf = GeoDataFrame(data) >>> buffered = gdf.buffer(0.5)
- property centroid
Returns a
GeoSeries
of points representing the centroid of each geometry.Note that centroid does not have to be on or within original geometry.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 POINT (0 0) dtype: geometry
>>> s.centroid 0 POINT (0.33333 0.66667) 1 POINT (0.70711 0.5) 2 POINT (0 0) dtype: geometry
See also
GeoSeries.representative_point
point guaranteed to be within each geometry
- contains(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that contains other.An object is said to contain other if at least one point of other lies in the interior and no points of other lie in the exterior of the object. (Therefore, any given polygon does not contain its own boundary - there is not any point that lies in the interior.) If either object is empty, this operation returns
False
.This is the inverse of within in the sense that the expression
a.contains(b) == b.within(a)
always evaluates toTrue
.Note: Sedona’s implementation instead returns False for identical geometries.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(0, 4), ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (1, 2), (0, 2)]), ... LineString([(0, 0), (0, 2)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 0 2) 2 LINESTRING (0 0, 0 1) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 POLYGON ((0 0, 1 2, 0 2, 0 0)) 3 LINESTRING (0 0, 0 2) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s.contains(point) 0 False 1 True 2 False 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s2.contains(s, align=True) 0 False 1 False 2 False 3 True 4 False dtype: bool
>>> s2.contains(s, align=False) 1 True 2 False 3 True 4 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
contains
any element of the other one.See also
GeoSeries.contains_properly
,GeoSeries.within
- covered_by(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is entirely covered by other.An object A is said to cover another object B if no points of B lie in the exterior of A.
Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... LineString([(1, 1), (1.5, 1.5)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... Point(0, 0), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 1 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 2 LINESTRING (1 1, 1.5 1.5) 3 POINT (0 0) dtype: geometry >>>
>>> s2 1 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 2 POLYGON ((0 0, 2 2, 0 2, 0 0)) 3 LINESTRING (0 0, 2 2) 4 POINT (0 0) dtype: geometry
We can check if each geometry of GeoSeries is covered by a single geometry:
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]) >>> s.covered_by(poly) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.covered_by(s2, align=True) 0 False 1 True 2 True 3 True 4 False dtype: bool
>>> s.covered_by(s2, align=False) 0 True 1 False 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is
covered_by
any element of the other one.See also
GeoSeries.covers
,GeoSeries.overlaps
- covers(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is entirely covering other.An object A is said to cover another object B if no points of B lie in the exterior of A. If either object is empty, this operation returns
False
.Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... LineString([(1, 1), (1.5, 1.5)]), ... Point(0, 0), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 POINT (0 0) dtype: geometry
>>> s2 1 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 2 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 3 LINESTRING (1 1, 1.5 1.5) 4 POINT (0 0) dtype: geometry
We can check if each geometry of GeoSeries covers a single geometry:
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]) >>> s.covers(poly) 0 True 1 False 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.covers(s2, align=True) 0 False 1 False 2 False 3 False 4 False dtype: bool
>>> s.covers(s2, align=False) 0 True 1 False 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
covers
any element of the other one.See also
GeoSeries.covered_by
,GeoSeries.overlaps
- crosses(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that cross other.An object is said to cross other if its interior intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.
Note: Unlike Geopandas, Sedona’s implementation always return NULL when GeometryCollection is involved.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 LINESTRING (0 0, 2 2) 2 LINESTRING (2 0, 0 2) 3 POINT (0 1) dtype: geometry >>> s2 1 LINESTRING (1 0, 1 3) 2 LINESTRING (2 0, 0 2) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries crosses a single geometry:
>>> line = LineString([(-1, 1), (3, 1)]) >>> s.crosses(line) 0 True 1 True 2 True 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.crosses(s2, align=True) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.crosses(s2, align=False) 0 True 1 True 2 False 3 False dtype: bool
Notice that a line does not cross a point that it contains.
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
crosses
any element of the other one.See also
GeoSeries.disjoint
,GeoSeries.intersects
- difference(other, align=None)[source]
Returns a
GeoSeries
of the points in each aligned geometry that are not in other.The operation works on a 1-to-1 row-wise manner:
Unlike Geopandas, Sedona does not support this operation for GeometryCollections.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the difference to.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 6), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s2.difference(point) 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 GEOMETRYCOLLECTION EMPTY dtype: geometry
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.difference(s2, align=True) 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) 5 POINT (0 1) dtype: geometry
>>> s.difference(s2, align=False) 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 GEOMETRYCOLLECTION EMPTY 3 LINESTRING (2 0, 0 2) 4 GEOMETRYCOLLECTION EMPTY dtype: geometry
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is different from any element of the other one.
See also
GeoSeries.intersection
- distance(other, align=None)[source]
Returns a
Series
containing the distance to aligned other.The operation works on a 1-to-1 row-wise manner:
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (float)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 0), (1, 1)]), ... Polygon([(0, 0), (-1, 0), (-1, 1)]), ... LineString([(1, 1), (0, 0)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Point(3, 1), ... LineString([(1, 0), (2, 0)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 0, 1 1, 0 0)) 1 POLYGON ((0 0, -1 0, -1 1, 0 0)) 2 LINESTRING (1 1, 0 0) 3 POINT (0 0) dtype: geometry
>>> s2 1 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 2 POINT (3 1) 3 LINESTRING (1 0, 2 0) 4 POINT (0 1) dtype: geometry
We can check the distance of each geometry of GeoSeries to a single geometry:
>>> point = Point(-1, 0) >>> s.distance(point) 0 1.0 1 0.0 2 1.0 3 1.0 dtype: float64
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using
align=True
or ignore index and use elements based on their matching order usingalign=False
:>>> s.distance(s2, align=True) 0 NaN 1 0.707107 2 2.000000 3 1.000000 4 NaN dtype: float64
>>> s.distance(s2, align=False) 0 0.000000 1 3.162278 2 0.707107 3 1.000000 dtype: float64
- dwithin(other, distance, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is within a set distance fromother
.The operation works on a 1-to-1 row-wise manner:
- Parameters:
other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.
distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be applied to all geometries. An array or Series will be applied elementwise. If np.array or pd.Series are used then it must have same length as the GeoSeries.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(0, 4), ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(1, 0), (4, 2), (2, 2)]), ... Polygon([(2, 0), (3, 2), (2, 2)]), ... LineString([(2, 0), (2, 2)]), ... Point(1, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 0 2) 2 LINESTRING (0 0, 0 1) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((1 0, 4 2, 2 2, 1 0)) 2 POLYGON ((2 0, 3 2, 2 2, 2 0)) 3 LINESTRING (2 0, 2 2) 4 POINT (1 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s2.dwithin(point, 1.8) 1 True 2 False 3 False 4 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.dwithin(s2, distance=1, align=True) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.dwithin(s2, distance=1, align=False) 0 True 1 False 2 False 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is within the set distance of any element of the other one.
See also
GeoSeries.within
- property envelope
Returns a
GeoSeries
of geometries representing the envelope of each geometry.The envelope of a geometry is the bounding rectangle. That is, the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains the geometry.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point, MultiPoint >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... MultiPoint([(0, 0), (1, 1)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 MULTIPOINT ((0 0), (1 1)) 3 POINT (0 0) dtype: geometry
>>> s.envelope 0 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 2 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 3 POINT (0 0) dtype: geometry
See also
GeoSeries.convex_hull
convex hull geometry
- property geom_type
Returns a series of strings specifying the geometry type of each geometry of each object.
Note: Unlike Geopandas, Sedona returns LineString instead of LinearRing.
- Returns:
A Series containing the geometry type of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon, Point >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Point(0, 0)]) >>> gs.geom_type 0 POLYGON 1 POINT dtype: object
- get_geometry(index)[source]
Returns the n-th geometry from a collection of geometries (0-indexed).
If the index is non-negative, it returns the geometry at that index. If the index is negative, it counts backward from the end of the collection (e.g., -1 returns the last geometry). Returns None if the index is out of bounds.
Note: Simple geometries act as length-1 collections
Note: Using Shapely < 2.0, may lead to different results for empty simple geometries due to how shapely interprets them.
- Parameters:
index (int or array_like) – Position of a geometry to be retrieved within its collection
- Return type:
Notes
Simple geometries act as collections of length 1. Any out-of-range index value returns None.
Examples
>>> from shapely.geometry import Point, MultiPoint, GeometryCollection >>> s = geopandas.GeoSeries( ... [ ... Point(0, 0), ... MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), ... GeometryCollection( ... [MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), Point(0, 1)] ... ), ... Polygon(), ... GeometryCollection(), ... ] ... ) >>> s 0 POINT (0 0) 1 MULTIPOINT ((0 0), (1 1), (0 1), (1 0)) 2 GEOMETRYCOLLECTION (MULTIPOINT ((0 0), (1 1), ... 3 POLYGON EMPTY 4 GEOMETRYCOLLECTION EMPTY dtype: geometry
>>> s.get_geometry(0) 0 POINT (0 0) 1 POINT (0 0) 2 MULTIPOINT ((0 0), (1 1), (0 1), (1 0)) 3 POLYGON EMPTY 4 None dtype: geometry
>>> s.get_geometry(1) 0 None 1 POINT (1 1) 2 POINT (0 1) 3 None 4 None dtype: geometry
>>> s.get_geometry(-1) 0 POINT (0 0) 1 POINT (1 0) 2 POINT (0 1) 3 POLYGON EMPTY 4 None dtype: geometry
- property has_sindex
Check the existence of the spatial index without generating it.
Use the .sindex attribute on a GeoDataFrame or GeoSeries to generate a spatial index if it does not yet exist, which may take considerable time based on the underlying index implementation.
Note that the underlying spatial index may not be fully initialized until the first use.
Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.
Examples
>>> from shapely.geometry import Point >>> s = GeoSeries([Point(x, x) for x in range(5)]) >>> s.has_sindex False >>> index = s.sindex >>> s.has_sindex True
- Returns:
True if the spatial index has been generated or False if not.
- Return type:
- property has_z
Returns a
Series
ofdtype('bool')
with valueTrue
for features that have a z-component.Notes
Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries( ... [ ... Point(0, 1), ... Point(0, 1, 2), ... ] ... ) >>> s 0 POINT (0 1) 1 POINT Z (0 1 2) dtype: geometry
>>> s.has_z 0 False 1 True dtype: bool
- intersection(other, align=None)[source]
Returns a
GeoSeries
of the intersection of points in each aligned geometry with other.The operation works on a 1-to-1 row-wise manner.
Note: Unlike most functions, intersection may return the unordered with respect to the index. If this is important to you, you may call
sort_index()
on the result.- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the intersection with.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 6), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 POINT (0 1) dtype: geometry
We can also do intersection of each geometry and a single shapely geometry:
>>> s.intersection(Polygon([(0, 0), (1, 1), (0, 1)])) 0 POLYGON ((0 0, 0 1, 1 1, 0 0)) 1 POLYGON ((0 0, 0 1, 1 1, 0 0)) 2 LINESTRING (0 0, 1 1) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.intersection(s2, align=True) 0 None 1 POLYGON ((0 0, 0 1, 1 1, 0 0)) 2 POINT (1 1) 3 LINESTRING (2 0, 0 2) 4 POINT EMPTY 5 None dtype: geometry
>>> s.intersection(s2, align=False) 0 POLYGON ((0 0, 0 1, 1 1, 0 0)) 1 LINESTRING (1 1, 1 2) 2 POINT (1 1) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
See also
GeoSeries.difference
,GeoSeries.symmetric_difference
,GeoSeries.union
- intersects(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that intersects other.An object is said to intersect other if its boundary and interior intersects in any way with those of the other.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 LINESTRING (0 0, 2 2) 2 LINESTRING (2 0, 0 2) 3 POINT (0 1) dtype: geometry
>>> s2 1 LINESTRING (1 0, 1 3) 2 LINESTRING (2 0, 0 2) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries crosses a single geometry:
>>> line = LineString([(-1, 1), (3, 1)]) >>> s.intersects(line) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.intersects(s2, align=True) 0 False 1 True 2 True 3 False 4 False dtype: bool
>>> s.intersects(s2, align=False) 0 True 1 True 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
crosses
any element of the other one.See also
GeoSeries.disjoint
,GeoSeries.crosses
,GeoSeries.touches
,GeoSeries.intersection
- property is_empty
Returns a
Series
ofdtype('bool')
with valueTrue
for empty geometries.Examples
An example of a GeoDataFrame with one empty point, one point and one missing value:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> geoseries = GeoSeries([Point(), Point(2, 1), None], crs="EPSG:4326") >>> geoseries 0 POINT EMPTY 1 POINT (2 1) 2 None
>>> geoseries.is_empty 0 True 1 False 2 False dtype: bool
See also
GeoSeries.isna
detect missing geometries
- property is_ring
Return a
Series
ofdtype('bool')
with valueTrue
for features that are closed.When constructing a LinearRing, the sequence of coordinates may be explicitly closed by passing identical values in the first and last indices. Otherwise, the sequence will be implicitly closed by copying the first tuple to the last index.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import LineString, LinearRing >>> s = GeoSeries( ... [ ... LineString([(0, 0), (1, 1), (1, -1)]), ... LineString([(0, 0), (1, 1), (1, -1), (0, 0)]), ... LinearRing([(0, 0), (1, 1), (1, -1)]), ... ] ... ) >>> s 0 LINESTRING (0 0, 1 1, 1 -1) 1 LINESTRING (0 0, 1 1, 1 -1, 0 0) 2 LINEARRING (0 0, 1 1, 1 -1, 0 0) dtype: geometry
>>> s.is_ring 0 False 1 True 2 True dtype: bool
- property is_simple
Returns a
Series
ofdtype('bool')
with valueTrue
for geometries that do not cross themselves.This is meaningful only for LineStrings and LinearRings.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import LineString >>> s = GeoSeries( ... [ ... LineString([(0, 0), (1, 1), (1, -1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, -1)]), ... ] ... ) >>> s 0 LINESTRING (0 0, 1 1, 1 -1, 0 1) 1 LINESTRING (0 0, 1 1, 1 -1) dtype: geometry
>>> s.is_simple 0 False 1 True dtype: bool
- property is_valid
Returns a
Series
ofdtype('bool')
with valueTrue
for geometries that are valid.Examples
An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... Polygon([(0,0), (1, 1), (1, 0), (0, 1)]), # bowtie geometry ... Polygon([(0, 0), (2, 2), (2, 0)]), ... None ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0)) 2 POLYGON ((0 0, 2 2, 2 0, 0 0)) 3 None dtype: geometry
>>> s.is_valid 0 True 1 False 2 True 3 False dtype: bool
See also
GeoSeries.is_valid_reason
reason for invalidity
- is_valid_reason()[source]
Returns a
Series
of strings with the reason for invalidity of each geometry.Examples
An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... Polygon([(0,0), (1, 1), (1, 0), (0, 1)]), # bowtie geometry ... Polygon([(0, 0), (2, 2), (2, 0)]), ... Polygon([(0, 0), (2, 0), (1, 1), (2, 2), (0, 2), (1, 1), (0, 0)]), ... None ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0)) 2 POLYGON ((0 0, 2 2, 2 0, 0 0)) 3 None dtype: geometry
>>> s.is_valid_reason() 0 Valid Geometry 1 Self-intersection at or near point (0.5, 0.5, NaN) 2 Valid Geometry 3 Ring Self-intersection at or near point (1.0, 1.0) 4 None dtype: object
See also
GeoSeries.is_valid
detect invalid geometries
GeoSeries.make_valid
fix invalid geometries
- property length
Returns a Series containing the length of each geometry in the GeoSeries.
In the case of a (Multi)Polygon it measures the length of its exterior (i.e. perimeter).
For a GeometryCollection it measures sums the values for each of the individual geometries.
- Returns:
A Series containing the length of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)]), GeometryCollection([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)])])]) >>> gs.length 0 0.000000 1 1.414214 2 3.414214 3 4.828427 dtype: float64
- make_valid(*, method='linework', keep_collapsed=True)[source]
Repairs invalid geometries.
Returns a
GeoSeries
with valid geometries.If the input geometry is already valid, then it will be preserved. In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same type to be made valid, then a multi-part geometry will be returned (e.g. a MultiPolygon). If the geometry must be split into multiple parts of different types to be made valid, then a GeometryCollection will be returned.
In Sedona, only the ‘structure’ method is available:
the ‘structure’ algorithm tries to reason from the structure of the input to find the ‘correct’ repair: exterior rings bound area, interior holes exclude area. It first makes all rings valid, then shells are merged and holes are subtracted from the shells to generate valid result. It assumes that holes and shells are correctly categorized in the input geometry.
- Parameters:
method ({'linework', 'structure'}, default 'linework') – Algorithm to use when repairing geometry. Sedona Geopandas only supports the ‘structure’ method. The default method is “linework” to match compatibility with Geopandas, but it must be explicitly set to ‘structure’ to use the Sedona implementation.
keep_collapsed (bool, default True) – For the ‘structure’ method, True will keep components that have collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import MultiPolygon, Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]), ... Polygon([(0, 2), (0, 1), (2, 0), (0, 0), (0, 2)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... ], ... ) >>> s 0 POLYGON ((0 0, 0 2, 1 1, 2 2, 2 0, 1 1, 0 0)) 1 POLYGON ((0 2, 0 1, 2 0, 0 0, 0 2)) 2 LINESTRING (0 0, 1 1, 1 0) dtype: geometry
>>> s.make_valid() 0 MULTIPOLYGON (((1 1, 0 0, 0 2, 1 1)), ((2 0, 1... 1 POLYGON ((0 1, 2 0, 0 0, 0 1)) 2 LINESTRING (0 0, 1 1, 1 0) dtype: geometry
- overlaps(other, align=None)[source]
Returns True for all aligned geometries that overlap other, else False.
In the original Geopandas, Geometries overlap if they have more than one but not all points in common, have the same dimension, and the intersection of the interiors of the geometries has the same dimension as the geometries themselves.
However, in Sedona, we return True in the case where the geometries points match.
Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, MultiPoint, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... MultiPoint([(0, 0), (0, 1)]), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (0, 2)]), ... LineString([(0, 1), (1, 1)]), ... LineString([(1, 1), (3, 3)]), ... Point(0, 1), ... ], ... )
We can check if each geometry of GeoSeries overlaps a single geometry:
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]) >>> s.overlaps(polygon) 0 True 1 True 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We align both GeoSeries based on index values and compare elements with the same index.
>>> s.overlaps(s2) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.overlaps(s2, align=False) 0 True 1 False 2 True 3 False dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
overlaps
any element of the other one.See also
GeoSeries.crosses
,GeoSeries.intersects
- segmentize(max_segment_length)[source]
Returns a
GeoSeries
with vertices added to line segments based on maximum segment length.Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment. Only linear components of input geometries are densified; other geometries are returned unmodified.
- Parameters:
max_segment_length (float | array-like) – Additional vertices will be added so that all line segments are no longer than this value. Must be greater than 0.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString >>> s = GeoSeries( ... [ ... LineString([(0, 0), (0, 10)]), ... Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]), ... ], ... ) >>> s 0 LINESTRING (0 0, 0 10) 1 POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0)) dtype: geometry
>>> s.segmentize(max_segment_length=5) 0 LINESTRING (0 0, 0 5, 0 10) 1 POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0... dtype: geometry
- simplify(tolerance=None, preserve_topology=True)[source]
Returns a
GeoSeries
containing a simplified representation of each geometry.The algorithm (Douglas-Peucker) recursively splits the original line into smaller parts and connects these parts’ endpoints by a straight line. Then, it removes all points whose distance to the straight line is smaller than tolerance. It does not move any points and it always preserves endpoints of the original line or polygon. See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify for details
Simplifies individual geometries independently, without considering the topology of a potential polygonal coverage. If you would like to treat the
GeoSeries
as a coverage and simplify its edges, while preserving the coverage topology, seesimplify_coverage()
.- Parameters:
tolerance (float) – All parts of a simplified geometry will be no more than tolerance distance from the original. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.
preserve_topology (bool (default True)) – False uses a quicker algorithm, but may produce self-intersecting or otherwise invalid geometries.
Notes
Invalid geometric objects may result from simplification that does not preserve topology and simplification may be sensitive to the order of coordinates: two geometries differing only in order of coordinates may be simplified differently.
See also
simplify_coverage
simplify geometries using coverage simplification
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point, LineString >>> s = GeoSeries( ... [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])] ... ) >>> s 0 POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19... 1 LINESTRING (0 0, 1 10, 0 20) dtype: geometry
>>> s.simplify(1) 0 POLYGON ((0 1, 0 -1, -1 0, 0 1)) 1 LINESTRING (0 0, 0 20) dtype: geometry
- property sindex: SpatialIndex
Returns a spatial index for the GeoSeries.
Note that the spatial index may not be fully initialized until the first use.
Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.
- Returns:
The spatial index.
- Return type:
Examples
>>> from shapely.geometry import Point, box >>> from sedona.spark.geopandas import GeoSeries >>> >>> s = GeoSeries([Point(x, x) for x in range(5)]) >>> s.sindex.query(box(1, 1, 3, 3)) [Point(1, 1), Point(2, 2), Point(3, 3)] >>> s.has_sindex True
- snap(other, tolerance, align=None)[source]
Snap the vertices and segments of the geometry to vertices of the reference.
Vertices and segments of the input geometry are snapped to vertices of the reference geometry, returning a new geometry; the input geometries are not modified. The result geometry is the input geometry with the vertices and segments snapped. If no snapping occurs then the input geometry is returned unchanged. The tolerance is used to control where snapping is performed.
Where possible, this operation tries to avoid creating invalid geometries; however, it does not guarantee that output geometries will be valid. It is the responsibility of the caller to check for and handle invalid geometries.
Because too much snapping can result in invalid geometries being created, heuristics are used to determine the number and location of snapped vertices that are likely safe to snap. These heuristics may omit some potential snaps that are otherwise within the tolerance.
Note: Sedona’s result may differ slightly from geopandas’s snap() result because of small differences between the underlying engines being used.
The operation works in a 1-to-1 row-wise manner:
- Parameters:
other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to snap to.
tolerance (float or array like) – Maximum distance between vertices that shall be snapped
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Point(0.5, 2.5), ... LineString([(0.1, 0.1), (0.49, 0.51), (1.01, 0.89)]), ... Polygon([(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)]), ... ], ... ) >>> s 0 POINT (0.5 2.5) 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0)) dtype: geometry
>>> s2 = GeoSeries( ... [ ... Point(0, 2), ... LineString([(0, 0), (0.5, 0.5), (1.0, 1.0)]), ... Point(8, 10), ... ], ... index=range(1, 4), ... ) >>> s2 1 POINT (0 2) 2 LINESTRING (0 0, 0.5 0.5, 1 1) 3 POINT (8 10) dtype: geometry
We can snap each geometry to a single shapely geometry:
>>> s.snap(Point(0, 2), tolerance=1) 0 POINT (0 2) 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0 0, 0 2, 0 10, 10 10, 10 0, 0 0)) dtype: geometry
We can also snap two GeoSeries to each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and snap elements with the same index using
align=True
or ignore index and snap elements based on their matching order usingalign=False
:>>> s.snap(s2, tolerance=1, align=True) 0 None 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0.5 0.5, 1 1, 0 10, 10 10, 10 0, 0.5... 3 None dtype: geometry
>>> s.snap(s2, tolerance=1, align=False) 0 POINT (0 2) 1 LINESTRING (0 0, 0.5 0.5, 1 1) 2 POLYGON ((0 0, 0 10, 8 10, 10 10, 10 0, 0 0)) dtype: geometry
- abstractmethod to_geopandas() GeoSeries | GeoDataFrame [source]
- property total_bounds
Returns a tuple containing
minx
,miny
,maxx
,maxy
values for the bounds of the series as a whole.See
GeoSeries.bounds
for the bounds of the geometries contained in the series.Examples
>>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.total_bounds array([ 0., -1., 3., 2.])
- touches(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that touches other.An object is said to touch other if it has at least one point in common with other and its interior does not intersect with any part of the other. Overlapping features therefore do not touch.
Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... MultiPoint([(0, 0), (0, 1)]), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (-2, 0), (0, -2)]), ... LineString([(0, 1), (1, 1)]), ... LineString([(1, 1), (3, 0)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 MULTIPOINT ((0 0), (0 1)) dtype: geometry
>>> s2 1 POLYGON ((0 0, -2 0, 0 -2, 0 0)) 2 LINESTRING (0 1, 1 1) 3 LINESTRING (1 1, 3 0) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries touches a single geometry:
>>> line = LineString([(0, 0), (-1, -2)]) >>> s.touches(line) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.touches(s2, align=True) 0 False 1 True 2 True 3 False 4 False dtype: bool
>>> s.touches(s2, align=False) 0 True 1 False 2 True 3 False dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
touches
any element of the other one.See also
GeoSeries.overlaps
,GeoSeries.intersects
- abstract property type
- union_all(method='unary', grid_size=None) BaseGeometry [source]
Returns a geometry containing the union of all geometries in the
GeoSeries
.Sedona does not support the method or grid_size argument, so the user does not need to manually decide the algorithm being used.
- Parameters:
method (str (default
"unary"
)) – Not supported in Sedona.grid_size (float, default None) – Not supported in Sedona.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import box >>> s = GeoSeries([box(0, 0, 1, 1), box(0, 0, 2, 2)]) >>> s 0 POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0)) 1 POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0)) dtype: geometry
>>> s.union_all() <POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))>
- within(other, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is within other.An object is said to be within other if at least one of its points is located in the interior and no points are located in the exterior of the other. If either object is empty, this operation returns
False
.This is the inverse of contains in the sense that the expression
a.within(b) == b.contains(a)
always evaluates toTrue
.Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections and for geometries that are equal.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (1, 2), (0, 2)]), ... LineString([(0, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 1 2, 0 2, 0 0)) 2 LINESTRING (0 0, 0 2) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (0 0, 0 2) 3 LINESTRING (0 0, 0 1)] 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries is within a single geometry:
>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)]) >>> s.within(polygon) 0 True 1 True 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s2.within(s) 0 False 1 False 2 True 3 False 4 False dtype: bool
>>> s2.within(s, align=False) 1 True 2 False 3 True 4 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is
within
any element of the other one.See also
GeoSeries.contains
sedona.spark.geopandas.geodataframe module
- class sedona.spark.geopandas.geodataframe.GeoDataFrame(*args: Any, **kwargs: Any)[source]
-
A pandas-on-Spark DataFrame for geospatial data with geometry columns.
GeoDataFrame extends pyspark.pandas.DataFrame to provide geospatial operations using Apache Sedona’s spatial functions. It maintains compatibility with GeoPandas GeoDataFrame while operating on distributed datasets.
- Parameters:
data (dict, array-like, DataFrame, or GeoDataFrame) – Data to initialize the GeoDataFrame. Can be a dictionary, array-like structure, pandas DataFrame, GeoPandas GeoDataFrame, or another GeoDataFrame.
geometry (str, array-like, or GeoSeries, optional) – Column name, array of geometries, or GeoSeries to use as the active geometry. If None, will look for existing geometry columns.
crs (pyproj.CRS, optional) – Coordinate Reference System for the geometries.
columns (Index or array-like, optional) – Column labels to use for the resulting frame.
index (Index or array-like, optional) – Index to use for the resulting frame.
Examples
>>> from shapely.geometry import Point, Polygon >>> from sedona.spark.geopandas import GeoDataFrame >>> import pandas as pd >>> >>> # Create from dictionary with geometry >>> data = { ... 'name': ['A', 'B', 'C'], ... 'geometry': [Point(0, 0), Point(1, 1), Point(2, 2)] ... } >>> gdf = GeoDataFrame(data, crs='EPSG:4326') >>> gdf name geometry 0 A POINT (0 0) 1 B POINT (1 1) 2 C POINT (2 2) >>> >>> # Spatial operations >>> buffered = gdf.buffer(0.1) >>> buffered.area 0 0.031416 1 0.031416 2 0.031416 dtype: float64 >>> >>> # Spatial joins >>> polygons = GeoDataFrame({ ... 'region': ['Region1', 'Region2'], ... 'geometry': [ ... Polygon([(-1, -1), (1, -1), (1, 1), (-1, 1)]), ... Polygon([(0.5, 0.5), (2.5, 0.5), (2.5, 2.5), (0.5, 2.5)]) ... ] ... }) >>> result = gdf.sjoin(polygons, how='left', predicate='within') >>> result['region'] 0 Region1 1 Region2 2 Region2 dtype: object
Notes
This implementation differs from GeoPandas in several ways: - Uses Spark for distributed processing - Geometries are stored in WKB (Well-Known Binary) format internally - Some methods may have different performance characteristics - Not all GeoPandas methods are implemented yet (see IMPLEMENTATION_STATUS)
Performance Considerations: - Operations are distributed across Spark cluster - Avoid converting to GeoPandas (.to_geopandas()) on large datasets - Use .sample() for testing with large datasets - Spatial joins are optimized for distributed processing
Geometry Column Management: - Supports multiple geometry columns - One geometry column is designated as ‘active’ at a time - Active geometry is used for spatial operations and plotting - Use set_geometry() to change the active geometry column
See also
geopandas.GeoDataFrame
The GeoPandas equivalent
sedona.spark.geopandas.GeoSeries
Series with geometry data
- __init__(data=None, index=None, columns=None, dtype=None, copy=False, geometry: Any | None = None, crs: Any | None = None, **kwargs)[source]
- property active_geometry_name: Any
Return the name of the active geometry column
Returns a name if a GeoDataFrame has an active geometry column set, otherwise returns None. The return type is usually a string, but may be an integer, tuple or other hashable, depending on the contents of the dataframe columns.
You can also access the active geometry column using the
.geometry
property. You can set a GeoSeries to be an active geometry using theset_geometry()
method.- Returns:
name of an active geometry column or None
- Return type:
str or other index label supported by pandas
See also
GeoDataFrame.set_geometry
set the active geometry
- copy(deep=False) GeoDataFrame [source]
Make a copy of this GeoDataFrame object.
- Parameters:
deep (bool, default False) – This parameter is not supported but just a dummy parameter to match pandas.
- Returns:
A copy of this GeoDataFrame object.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoDataFrame
>>> gdf = GeoDataFrame([{"geometry": Point(1, 1), "value1": 2, "value2": 3}]) >>> gdf_copy = gdf.copy() >>> print(gdf_copy) geometry value1 value2 0 POINT (1 1) 2 3
- property crs
- classmethod from_arrow(table, geometry: str | None = None, to_pandas_kwargs: dict | None = None)[source]
Construct a GeoDataFrame from a Arrow table object based on GeoArrow extension types.
See https://geoarrow.org/ for details on the GeoArrow specification.
This functions accepts any tabular Arrow object implementing the Arrow PyCapsule Protocol (i.e. having an
__arrow_c_array__
or__arrow_c_stream__
method).Added in version 1.0.
- Parameters:
table (pyarrow.Table or Arrow-compatible table) – Any tabular object implementing the Arrow PyCapsule Protocol (i.e. has an
__arrow_c_array__
or__arrow_c_stream__
method). This table should have at least one column with a geoarrow geometry type.geometry (str, default None) – The name of the geometry column to set as the active geometry column. If None, the first geometry column found will be used.
to_pandas_kwargs (dict, optional) – Arguments passed to the pa.Table.to_pandas method for non-geometry columns. This can be used to control the behavior of the conversion of the non-geometry columns to a pandas DataFrame. For example, you can use this to control the dtype conversion of the columns. By default, the to_pandas method is called with no additional arguments.
- Return type:
See also
GeoDataFrame.to_arrow
,GeoSeries.from_arrow
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> import geoarrow.pyarrow as ga # requires: pip install geoarrow-pyarrow >>> import pyarrow as pa # requires: pip install pyarrow >>> table = pa.Table.from_arrays([ ... ga.as_geoarrow([None, "POLYGON ((0 0, 1 1, 0 1, 0 0))", "LINESTRING (0 0, -1 1, 0 -1)"]), ... pa.array([1, 2, 3]), ... pa.array(["a", "b", "c"]), ... ], names=["geometry", "id", "value"]) >>> gdf = GeoDataFrame.from_arrow(table) >>> gdf geometry id value 0 None 1 a 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 b 2 LINESTRING (0 0, -1 1, 0 -1) 3 c
- classmethod from_dict(data: dict, geometry=None, crs: Any | None = None, **kwargs) GeoDataFrame [source]
- classmethod from_features(features, crs: Any | None = None, columns: Iterable[str] | None = None) GeoDataFrame [source]
- classmethod from_file(filename: str, format: str | None = None, **kwargs) GeoDataFrame [source]
Alternate constructor to create a
GeoDataFrame
from a file.- Parameters:
filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in that directory.
format (str, optional) – The format of the file to read, by default None. If None, Sedona infers the format from the file extension. Note that format inference is not supported for directories. Available formats are “shapefile”, “geojson”, “geopackage”, and “geoparquet”.
table_name (str, optional) – The name of the table to read from a GeoPackage file, by default None. This is required if
format
is “geopackage”.**kwargs – Additional keyword arguments passed to the file reader.
- Returns:
A new GeoDataFrame created from the file.
- Return type:
See also
GeoDataFrame.to_file
Write a
GeoDataFrame
to a file.
- classmethod from_postgis(sql: str | sqlalchemy.text, con, geom_col: str = 'geom', crs: Any | None = None, index_col: str | list[str] | None = None, coerce_float: bool = True, parse_dates: list | dict | None = None, params: list | tuple | dict | None = None, chunksize: int | None = None) GeoDataFrame [source]
- iterfeatures(na: str = 'null', show_bbox: bool = False, drop_id: bool = False) Generator[dict] [source]
- plot(*args, **kwargs)[source]
Plot a GeoDataFrame.
Generate a plot of a GeoDataFrame with matplotlib. If a column is specified, the plot coloring will be based on values in that column.
Note: This method is not scalable and requires collecting all data to the driver.
- Parameters:
column (str, np.array, pd.Series, pd.Index (default None)) – The name of the dataframe column, np.array, pd.Series, or pd.Index to be plotted. If np.array, pd.Series, or pd.Index are used then it must have same length as dataframe. Values are used to color the plot. Ignored if color is also set.
kind (str) –
The kind of plots to produce. The default is to create a map (“geo”). Other supported kinds of plots from pandas:
’line’ : line plot
’bar’ : vertical bar plot
’barh’ : horizontal bar plot
’hist’ : histogram
’box’ : BoxPlot
’kde’ : Kernel Density Estimation plot
’density’ : same as ‘kde’
’area’ : area plot
’pie’ : pie plot
’scatter’ : scatter plot
’hexbin’ : hexbin plot.
cmap (str (default None)) – The name of a colormap recognized by matplotlib.
color (str, np.array, pd.Series (default None)) – If specified, all objects will be colored uniformly.
ax (matplotlib.pyplot.Artist (default None)) – axes on which to draw the plot
cax (matplotlib.pyplot Artist (default None)) – axes on which to draw the legend in case of color map.
categorical (bool (default False)) – If False, cmap will reflect numerical values of the column being plotted. For non-numerical columns, this will be set to True.
legend (bool (default False)) – Plot a legend. Ignored if no column is given, or if color is given.
scheme (str (default None)) – Name of a choropleth classification scheme (requires mapclassify). A mapclassify.MapClassifier object will be used under the hood. Supported are all schemes provided by mapclassify (e.g. ‘BoxPlot’, ‘EqualInterval’, ‘FisherJenks’, ‘FisherJenksSampled’, ‘HeadTailBreaks’, ‘JenksCaspall’, ‘JenksCaspallForced’, ‘JenksCaspallSampled’, ‘MaxP’, ‘MaximumBreaks’, ‘NaturalBreaks’, ‘Quantiles’, ‘Percentiles’, ‘StdMean’, ‘UserDefined’). Arguments can be passed in classification_kwds.
k (int (default 5)) – Number of classes (ignored if scheme is None)
vmin (None or float (default None)) – Minimum value of cmap. If None, the minimum data value in the column to be plotted is used.
vmax (None or float (default None)) – Maximum value of cmap. If None, the maximum data value in the column to be plotted is used.
markersize (str or float or sequence (default None)) – Only applies to point geometries within a frame. If a str, will use the values in the column of the frame specified by markersize to set the size of markers. Otherwise can be a value to apply to all points, or a sequence of the same length as the number of points.
figsize (tuple of integers (default None)) – Size of the resulting matplotlib.figure.Figure. If the argument axes is given explicitly, figsize is ignored.
legend_kwds (dict (default None)) –
Keyword arguments to pass to
matplotlib.pyplot.legend()
ormatplotlib.pyplot.colorbar()
. Additional accepted keywords when scheme is specified:- fmtstring
A formatting specification for the bin edges of the classes in the legend. For example, to have no decimals:
{"fmt": "{:.0f}"}
.- labelslist-like
A list of legend labels to override the auto-generated labels. Needs to have the same number of elements as the number of classes (k).
- intervalboolean (default False)
An option to control brackets from mapclassify legend. If True, open/closed interval brackets are shown in the legend.
categories (list-like) – Ordered list-like object of categories to be used for categorical plot.
classification_kwds (dict (default None)) – Keyword arguments to pass to mapclassify
missing_kwds (dict (default None)) – Keyword arguments specifying color options (as style_kwds) to be passed on to geometries with missing values in addition to or overwriting other style kwds. If None, geometries with missing values are not plotted.
aspect ('auto', 'equal', None or float (default 'auto')) – Set aspect of axis. If ‘auto’, the default aspect for map plots is ‘equal’; if however data are not projected (coordinates are long/lat), the aspect is by default set to 1/cos(df_y * pi/180) with df_y the y coordinate of the middle of the GeoDataFrame (the mean of the y range of bounding box) so that a long/lat square appears square in the middle of the plot. This implies an Equirectangular projection. If None, the aspect of ax won’t be changed. It can also be set manually (float) as the ratio of y-unit to x-unit.
autolim (bool (default True)) – Update axes data limits to contain the new geometries.
**style_kwds (dict) – Style options to be passed on to the actual plot function, such as
edgecolor
,facecolor
,linewidth
,markersize
,alpha
.
- Returns:
ax
- Return type:
matplotlib axes instance
Examples
>>> import geodatasets # requires: pip install geodatasets >>> import geopandas as gpd >>> df = gpd.read_file(geodatasets.get_path("nybb")) >>> df.head() BoroCode ... geometry 0 5 ... MULTIPOLYGON (((970217.022 145643.332, 970227.... 1 4 ... MULTIPOLYGON (((1029606.077 156073.814, 102957... 2 3 ... MULTIPOLYGON (((1021176.479 151374.797, 102100... 3 1 ... MULTIPOLYGON (((981219.056 188655.316, 980940.... 4 2 ... MULTIPOLYGON (((1012821.806 229228.265, 101278...
>>> df.plot("BoroName", cmap="Set1")
- rename_geometry(col: str, inplace: Literal[True] = False) None [source]
- rename_geometry(col: str, inplace: Literal[False] = False) GeoDataFrame
Renames the GeoDataFrame geometry column to the specified name. By default yields a new object.
The original geometry column is replaced with the input.
- Parameters:
col (new geometry column label)
inplace (boolean, default False) – Modify the GeoDataFrame in place (without creating a new object)
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> from shapely.geometry import Point >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> df = GeoDataFrame(d, crs="EPSG:4326") >>> df1 = df.rename_geometry('geom1') >>> df1.geometry.name 'geom1' >>> df.rename_geometry('geom1', inplace=True) >>> df.geometry.name 'geom1'
See also
GeoDataFrame.set_geometry
set the active geometry
- set_crs(crs, inplace=False, allow_override=True)[source]
Set the Coordinate Reference System (CRS) of the
GeoDataFrame
.If there are multiple geometry columns within the GeoDataFrame, only the CRS of the active geometry column is set.
Pass
None
to remove CRS from the active geometry column.Notes
The underlying geometries are not transformed to this CRS. To transform the geometries to a new CRS, use the
to_crs
method.- Parameters:
crs (pyproj.CRS | None, optional) – The value can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.epsg (int, optional) – EPSG code specifying the projection.
inplace (bool, default False) – If True, the CRS of the GeoDataFrame will be changed in place (while still returning the result) instead of making a copy of the GeoDataFrame.
allow_override (bool, default True) – If the GeoDataFrame already has a CRS, allow to replace the existing CRS, even when both are not equal. In Sedona, setting this to True will lead to eager evaluation instead of lazy evaluation. Unlike Geopandas, True is the default value in Sedona for performance reasons.
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> from shapely.geometry import Point >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = GeoDataFrame(d) >>> gdf col1 geometry 0 name1 POINT (1 2) 1 name2 POINT (2 1)
Setting CRS to a GeoDataFrame without one:
>>> gdf.crs is None True
>>> gdf = gdf.set_crs('epsg:3857') >>> gdf.crs <Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World - 85°S to 85°N - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Overriding existing CRS:
>>> gdf = gdf.set_crs(4326, allow_override=True)
Without
allow_override=True
,set_crs
returns an error if you try to override CRS.See also
GeoDataFrame.to_crs
re-project to another CRS
- set_geometry(col, drop: bool | None = None, inplace: Literal[True] = False, crs: Any | None = None) None [source]
- set_geometry(col, drop: bool | None = None, inplace: Literal[False] = False, crs: Any | None = None) GeoDataFrame
Set the GeoDataFrame geometry using either an existing column or the specified input. By default yields a new object.
The original geometry column is replaced with the input.
- Parameters:
col (column label or array-like) – An existing column name or values to set as the new geometry column. If values (array-like, (Geo)Series) are passed, then if they are named (Series) the new geometry column will have the corresponding name, otherwise the existing geometry column will be replaced. If there is no existing geometry column, the new geometry column will use the default name “geometry”.
drop (boolean, default False) –
When specifying a named Series or an existing column name for col, controls if the previous geometry column should be dropped from the result. The default of False keeps both the old and new geometry column.
Deprecated since version 1.0.0.
inplace (boolean, default False) – Modify the GeoDataFrame in place (do not create a new object)
crs (pyproj.CRS, optional) – Coordinate system to use. The value can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string. If passed, overrides both DataFrame and col’s crs. Otherwise, tries to get crs from passed col values or DataFrame.
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> from shapely.geometry import Point >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = GeoDataFrame(d, crs="EPSG:4326") >>> gdf col1 geometry 0 name1 POINT (1 2) 1 name2 POINT (2 1)
Passing an array:
>>> df1 = gdf.set_geometry([Point(0,0), Point(1,1)]) >>> df1 col1 geometry 0 name1 POINT (0 0) 1 name2 POINT (1 1)
Using existing column:
>>> gdf["buffered"] = gdf.buffer(2) >>> df2 = gdf.set_geometry("buffered") >>> df2.geometry 0 POLYGON ((3 2, 2.99037 1.80397, 2.96157 1.6098... 1 POLYGON ((4 1, 3.99037 0.80397, 3.96157 0.6098... Name: buffered, dtype: geometry
- Return type:
See also
GeoDataFrame.rename_geometry
rename an active geometry column
- sjoin(other, how='inner', predicate='intersects', lsuffix='left', rsuffix='right', distance=None, on_attribute=None, **kwargs)[source]
Spatial join of two GeoDataFrames.
- Parameters:
other (GeoDataFrame) – The right GeoDataFrame to join with.
how (str, default 'inner') – The type of join: * ‘left’: use keys from left_df; retain only left_df geometry column * ‘right’: use keys from right_df; retain only right_df geometry column * ‘inner’: use intersection of keys from both dfs; retain only left_df geometry column
predicate (str, default 'intersects') – Binary predicate. Valid values: ‘intersects’, ‘contains’, ‘within’, ‘dwithin’
lsuffix (str, default 'left') – Suffix to apply to overlapping column names (left GeoDataFrame).
rsuffix (str, default 'right') – Suffix to apply to overlapping column names (right GeoDataFrame).
distance (float, optional) – Distance for ‘dwithin’ predicate. Required if predicate=’dwithin’.
on_attribute (str, list or tuple, optional) – Column name(s) to join on as an additional join restriction. These must be found in both DataFrames.
**kwargs – Additional keyword arguments passed to the spatial join function.
- Returns:
A GeoDataFrame with the results of the spatial join.
- Return type:
Examples
>>> from shapely.geometry import Point, Polygon >>> from sedona.spark.geopandas import GeoDataFrame
>>> polygons = GeoDataFrame({ ... 'geometry': [Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])], ... 'value': [1] ... }) >>> points = GeoDataFrame({ ... 'geometry': [Point(0.5, 0.5), Point(2, 2)], ... 'value': [1, 2] ... }) >>> joined = points.sjoin(polygons) >>> joined geometry_left value_left geometry_right value_right 0 POINT (0.5 0.5) 1 POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0)) 1
- to_arrow(*, index: bool | None = None, geometry_encoding='WKB', interleaved: bool = True, include_z: bool | None = None)[source]
Encode a GeoDataFrame to GeoArrow format. See https://geoarrow.org/ for details on the GeoArrow specification. This function returns a generic Arrow data object implementing the Arrow PyCapsule Protocol (i.e. having an
__arrow_c_stream__
method). This object can then be consumed by your Arrow implementation of choice that supports this protocol. .. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.htmlNote: Requires geopandas versions >= 1.0.0 to use with Sedona.
- Parameters:
index (bool, default None) –
If
True
, always include the dataframe’s index(es) as columns in the file output. IfFalse
, the index(es) will not be written to the file. IfNone
, the index(ex) will be included as columns in the file output except RangeIndex which is stored as metadata only.Note: Unlike in geopandas,
None
will include the index in the column because Sedona always converts RangeIndex into a general Index.geometry_encoding ({'WKB', 'geoarrow' }, default 'WKB') – The GeoArrow encoding to use for the data conversion.
interleaved (bool, default True) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’ coordinates are interleaved in a single fixed size list array. If False, the coordinates are stored as separate arrays in a struct type.
include_z (bool, default None) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality of the individual geometries is preserved). If False, return 2D geometries. If True, include the third dimension in the output (if a geometry has no third dimension, the z-coordinates will be NaN). By default, will infer the dimensionality from the input geometries. Note that this inference can be unreliable with empty geometries (for a guaranteed result, it is recommended to specify the keyword).
- Returns:
A generic Arrow table object with geometry columns encoded to GeoArrow.
- Return type:
ArrowTable
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> from shapely.geometry import Point >>> data = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = GeoDataFrame(data) >>> gdf col1 geometry 0 name1 POINT (1 2) 1 name2 POINT (2 1) >>> arrow_table = gdf.to_arrow(index=False) >>> arrow_table <geopandas.io._geoarrow.ArrowTable object at ...> The returned data object needs to be consumed by a library implementing the Arrow PyCapsule Protocol. For example, wrapping the data as a pyarrow.Table (requires pyarrow >= 14.0): >>> import pyarrow as pa # requires: pip install pyarrow >>> table = pa.table(arrow_table) >>> table pyarrow.Table col1: string geometry: binary ---- col1: [["name1","name2"]] geometry: [[0101000000000000000000F03F0000000000000040,01010000000000000000000040000000000000F03F]]
- to_crs(crs: Any | None = None, epsg: int | None = None, inplace: bool = False) GeoDataFrame | None [source]
Transform geometries to a new coordinate reference system.
Transform all geometries in an active geometry column to a different coordinate reference system. The
crs
attribute on the current GeoSeries must be set. Eithercrs
orepsg
may be specified for output.This method will transform all points in all objects. It has no notion of projecting entire geometries. All segments joining points are assumed to be lines in the current projection, not geodesics. Objects crossing the dateline (or other projection boundary) will have undesirable behavior.
- Parameters:
crs (pyproj.CRS, optional if epsg is specified) – The value can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.epsg (int, optional if crs is specified) – EPSG code specifying output projection.
inplace (bool, optional, default: False) – Whether to return a new GeoDataFrame or do the transformation in place.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoDataFrame >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = GeoDataFrame(d, crs=4326) >>> gdf col1 geometry 0 name1 POINT (1 2) 1 name2 POINT (2 1) >>> gdf.crs <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
>>> gdf = gdf.to_crs(3857) >>> gdf col1 geometry 0 name1 POINT (111319.491 222684.209) 1 name2 POINT (222638.982 111325.143) >>> gdf.crs <Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World - 85°S to 85°N - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
See also
GeoDataFrame.set_crs
assign CRS without re-projection
- to_feather(path, index: bool | None = None, compression: str | None = None, schema_version=None, **kwargs)[source]
- to_file(path: str, driver: str | None = None, schema: dict | None = None, index: bool | None = None, **kwargs)[source]
Write the
GeoDataFrame
to a file.- Parameters:
path (str) – File path or file handle to write to.
driver (str, default None) –
The format driver used to write the file. If not specified, it attempts to infer it from the file extension. If no extension is specified, Sedona will error.
Options: “geojson”, “geopackage”, “geoparquet”
schema (dict, default None) – Not applicable to Sedona’s implementation.
index (bool, default None) – If True, write index into one or more columns (for MultiIndex). Default None writes the index into one or more columns only if the index is named, is a MultiIndex, or has a non-integer data type. If False, no index is written.
**kwargs –
Additional keyword arguments:
- modestr, default ‘w’
The write mode, ‘w’ to overwrite the existing file and ‘a’ to append. ‘overwrite’ and ‘append’ are equivalent to ‘w’ and ‘a’ respectively.
- crspyproj.CRS, default None
If specified, the CRS is passed to Fiona to better control how the file is written. If None, GeoPandas will determine the CRS based on the
crs
attribute. The value can be anything accepted bypyproj.CRS.from_user_input
, such as an authority string (e.g., “EPSG:4326”) or a WKT string.- enginestr
Not applicable to Sedona’s implementation.
- metadatadict[str, str], default None
Optional metadata to be stored in the file. Keys and values must be strings. Supported only for “GPKG” driver. Not supported by Sedona.
Examples
>>> from shapely.geometry import Point, LineString >>> from sedona.spark.geopandas import GeoDataFrame
>>> gdf = GeoDataFrame({ ... "geometry": [Point(0, 0), LineString([(0, 0), (1, 1)])], ... "int": [1, 2] ... }) >>> gdf.to_file("output.parquet", driver="geoparquet")
With selected drivers you can also append to a file with
mode="a"
:>>> gdf.to_file("output.geojson", driver="geojson", mode="a")
When the index is of non-integer dtype,
index=None
(default) is treated as True, writing the index to the file.>>> gdf = GeoDataFrame({"geometry": [Point(0, 0), Point(1, 1)]}, index=["a", "b"]) >>> gdf.to_file("output_with_index.parquet", driver="geoparquet")
- to_geopandas() GeoDataFrame [source]
Note: Unlike in pandas and geopandas, Sedona will always return a general Index. This differs from pandas and geopandas, which will return a RangeIndex by default.
e.g pd.Index([0, 1, 2]) instead of pd.RangeIndex(start=0, stop=3, step=1)
- to_json(na: Literal['null', 'drop', 'keep'] = 'null', show_bbox: bool = False, drop_id: bool = False, to_wgs84: bool = False, **kwargs) str [source]
Returns a GeoJSON representation of the
GeoDataFrame
as a string.- Parameters:
na ({'null', 'drop', 'keep'}, default 'null') – Dictates how to represent missing (NaN) values in the output. -
null
: Outputs missing entries as JSON null. -drop
: Removes the entire property from a feature if its value is missing. -keep
: Outputs missing entries asNaN
.show_bbox (bool, default False) – If True, the bbox (bounds) of the geometries is included in the output.
drop_id (bool, default False) – If True, the GeoDataFrame index is not written to the ‘id’ field of each GeoJSON Feature.
to_wgs84 (bool, default False) – If True, all geometries are transformed to WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification. When False, the current CRS is exported if it’s set.
**kwargs – Additional keyword arguments passed to json.dumps().
- Returns:
A GeoJSON representation of the GeoDataFrame.
- Return type:
See also
GeoDataFrame.to_file
Write a
GeoDataFrame
to a file, which can be used for GeoJSON format.
Examples
>>> from sedona.spark.geopandas import GeoDataFrame >>> from shapely.geometry import Point >>> d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} >>> gdf = GeoDataFrame(d, crs="EPSG:3857") >>> gdf.to_json() '{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"col1": "name1"}, "geometry": {"type": "Point", "coordinates": [1.0, 2.0]}}, {"id": "1", "type": "Feature", "properties": {"col1": "name2"}, "geometry": {"type": "Point", "coordinates": [2.0, 1.0]}}], "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:EPSG::3857"}}}'
See also
GeoDataFrame.to_file
write GeoDataFrame to file
- to_parquet(path, **kwargs)[source]
Write the GeoDataFrame to a GeoParquet file.
- Parameters:
path (str) – The file path where the GeoParquet file will be written.
**kwargs – Additional arguments to pass to the Sedona DataFrame output function.
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoDataFrame >>> gdf = GeoDataFrame({"geometry": [Point(0, 0), Point(1, 1)], "value": [1, 2]}) >>> gdf.to_parquet("output.parquet")
- to_spark_pandas() pyspark.pandas.DataFrame [source]
Convert the GeoDataFrame to a Spark Pandas DataFrame.
- property type
sedona.spark.geopandas.geoseries module
- class sedona.spark.geopandas.geoseries.GeoSeries(*args: Any, **kwargs: Any)[source]
-
A pandas-on-Spark Series for geometric/spatial operations.
GeoSeries extends pyspark.pandas.Series to provide spatial operations using Apache Sedona’s spatial functions. It maintains compatibility with GeoPandas GeoSeries while operating on distributed datasets.
- Parameters:
data (array-like, Iterable, dict, or scalar value) – Contains the data for the GeoSeries. Can be geometries, WKB bytes, or other GeoSeries/GeoDataFrame objects.
index (array-like or Index (1d), optional) – Values must be hashable and have the same length as data.
crs (pyproj.CRS, optional) – Coordinate Reference System for the geometries.
dtype (dtype, optional) – Data type for the GeoSeries.
name (str, optional) – Name of the GeoSeries.
copy (bool, default False) – Whether to copy the input data.
Examples
>>> from shapely.geometry import Point, Polygon >>> from sedona.spark.geopandas import GeoSeries >>> >>> # Create from geometries >>> s = GeoSeries([Point(0, 0), Point(1, 1)], crs='EPSG:4326') >>> s 0 POINT (0 0) 1 POINT (1 1) dtype: geometry >>> >>> # Spatial operations >>> s.buffer(0.1).area 0 0.031416 1 0.031416 dtype: float64 >>> >>> # CRS operations >>> s_utm = s.to_crs('EPSG:32633') >>> s_utm.crs <Projected CRS: EPSG:32633> Name: WGS 84 / UTM zone 33N ...
Notes
This implementation differs from GeoPandas in several ways: - Uses Spark for distributed processing - Geometries are stored in WKB (Well-Known Binary) format internally - Some methods may have different performance characteristics - Not all GeoPandas methods are implemented yet (see IMPLEMENTATION_STATUS)
Performance Considerations: - Operations are distributed across Spark cluster - Avoid calling .to_geopandas() on large datasets - Use .sample() for testing with large datasets
See also
geopandas.GeoSeries
The GeoPandas equivalent
sedona.spark.geopandas.GeoDataFrame
DataFrame with geometry column
- __init__(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False, crs=None, **kwargs)[source]
Initialize a GeoSeries object.
Parameters: - data: The input data for the GeoSeries. It can be a GeoDataFrame, GeoSeries, or pandas Series. - index: The index for the GeoSeries. - crs: Coordinate Reference System for the GeoSeries. - dtype: Data type for the GeoSeries. - name: Name of the GeoSeries. - copy: Whether to copy the input data. - fastpath: Internal parameter for fast initialization.
Examples
>>> from shapely.geometry import Point >>> import geopandas as gpd >>> import pandas as pd >>> from sedona.spark.geopandas import GeoSeries
# Example 1: Initialize with GeoDataFrame >>> gdf = gpd.GeoDataFrame({‘geometry’: [Point(1, 1), Point(2, 2)]}) >>> gs = GeoSeries(data=gdf) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) Name: geometry, dtype: geometry
# Example 2: Initialize with GeoSeries >>> gseries = gpd.GeoSeries([Point(1, 1), Point(2, 2)]) >>> gs = GeoSeries(data=gseries) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) dtype: geometry
# Example 3: Initialize with pandas Series >>> pseries = pd.Series([Point(1, 1), Point(2, 2)]) >>> gs = GeoSeries(data=pseries) >>> print(gs) 0 POINT (1 1) 1 POINT (2 2) dtype: geometry
- property area: pyspark.pandas.Series
Returns a Series containing the area of each geometry in the GeoSeries expressed in the units of the CRS.
- Returns:
A Series containing the area of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])]) >>> gs.area 0 1.0 1 4.0 dtype: float64
- property boundary: GeoSeries
Returns a
GeoSeries
of lower dimensional objects representing each geometry’s set-theoretic boundary.Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 POINT (0 0) dtype: geometry
>>> s.boundary 0 LINESTRING (0 0, 1 1, 0 1, 0 0) 1 MULTIPOINT ((0 0), (1 0)) 2 GEOMETRYCOLLECTION EMPTY dtype: geometry
See also
GeoSeries.exterior
outer boundary (without interior rings)
- property bounds: pyspark.pandas.DataFrame
Returns a
DataFrame
with columnsminx
,miny
,maxx
,maxy
values containing the bounds for each geometry.See
GeoSeries.total_bounds
for the limits of the entire series.Examples
>>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(2, 1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.bounds minx miny maxx maxy 0 2.0 1.0 2.0 1.0 1 0.0 0.0 1.0 1.0 2 0.0 1.0 1.0 2.0
You can assign the bounds to the
GeoDataFrame
as:>>> import pandas as pd >>> gdf = pd.concat([gdf, gdf.bounds], axis=1) >>> gdf geometry minx miny maxx maxy 0 POINT (2 1) 2.0 1.0 2.0 1.0 1 POLYGON ((0 0, 1 1, 1 0, 0 0)) 0.0 0.0 1.0 1.0 2 LINESTRING (0 1, 1 2) 0.0 1.0 1.0 2.0
- buffer(distance, resolution=16, cap_style='round', join_style='round', mitre_limit=5.0, single_sided=False, **kwargs) GeoSeries [source]
Returns a GeoSeries with all geometries buffered by the specified distance.
- Parameters:
distance (float) – The distance to buffer by. Negative distances will create inward buffers.
resolution (int, default 16) – The resolution of the buffer around each vertex. Specifies the number of linear segments in a quarter circle in the approximation of circular arcs.
cap_style (str, default "round") – The style of the buffer cap. One of ‘round’, ‘flat’, ‘square’.
join_style (str, default "round") – The style of the buffer join. One of ‘round’, ‘mitre’, ‘bevel’.
mitre_limit (float, default 5.0) – The mitre limit ratio for joins when join_style=’mitre’.
single_sided (bool, default False) – Whether to create a single-sided buffer. In Sedona, True will default to left-sided buffer. However, ‘right’ may be specified to use a right-sided buffer.
- Returns:
A new GeoSeries with buffered geometries.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoDataFrame >>> >>> data = { ... 'geometry': [Point(0, 0), Point(1, 1)], ... 'value': [1, 2] ... } >>> gdf = GeoDataFrame(data) >>> buffered = gdf.buffer(0.5)
- property centroid: GeoSeries
Returns a
GeoSeries
of points representing the centroid of each geometry.Note that centroid does not have to be on or within original geometry.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 POINT (0 0) dtype: geometry
>>> s.centroid 0 POINT (0.33333 0.66667) 1 POINT (0.70711 0.5) 2 POINT (0 0) dtype: geometry
See also
GeoSeries.representative_point
point guaranteed to be within each geometry
- contains(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that contains other.An object is said to contain other if at least one point of other lies in the interior and no points of other lie in the exterior of the object. (Therefore, any given polygon does not contain its own boundary - there is not any point that lies in the interior.) If either object is empty, this operation returns
False
.This is the inverse of within in the sense that the expression
a.contains(b) == b.within(a)
always evaluates toTrue
.Note: Sedona’s implementation instead returns False for identical geometries.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(0, 4), ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (1, 2), (0, 2)]), ... LineString([(0, 0), (0, 2)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 0 2) 2 LINESTRING (0 0, 0 1) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 POLYGON ((0 0, 1 2, 0 2, 0 0)) 3 LINESTRING (0 0, 0 2) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s.contains(point) 0 False 1 True 2 False 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s2.contains(s, align=True) 0 False 1 False 2 False 3 True 4 False dtype: bool
>>> s2.contains(s, align=False) 1 True 2 False 3 True 4 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
contains
any element of the other one.See also
- property convex_hull
- copy(deep=False)[source]
Make a copy of this GeoSeries object.
- Parameters:
deep (bool, default False) – If True, a deep copy of the data is made. Otherwise, a shallow copy is made.
- Returns:
A copy of this GeoSeries object.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoSeries >>> gs = GeoSeries([Point(1, 1), Point(2, 2)]) >>> gs_copy = gs.copy() >>> print(gs_copy) 0 POINT (1 1) 1 POINT (2 2) dtype: geometry
- covered_by(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is entirely covered by other.An object A is said to cover another object B if no points of B lie in the exterior of A.
Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... LineString([(1, 1), (1.5, 1.5)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... Point(0, 0), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 1 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 2 LINESTRING (1 1, 1.5 1.5) 3 POINT (0 0) dtype: geometry >>>
>>> s2 1 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 2 POLYGON ((0 0, 2 2, 0 2, 0 0)) 3 LINESTRING (0 0, 2 2) 4 POINT (0 0) dtype: geometry
We can check if each geometry of GeoSeries is covered by a single geometry:
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]) >>> s.covered_by(poly) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.covered_by(s2, align=True) 0 False 1 True 2 True 3 True 4 False dtype: bool
>>> s.covered_by(s2, align=False) 0 True 1 False 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is
covered_by
any element of the other one.See also
- covers(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is entirely covering other.An object A is said to cover another object B if no points of B lie in the exterior of A. If either object is empty, this operation returns
False
.Note: Sedona’s implementation instead returns False for identical geometries. Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
See https://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html for reference.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to check is being covered.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]), ... LineString([(1, 1), (1.5, 1.5)]), ... Point(0, 0), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 POINT (0 0) dtype: geometry
>>> s2 1 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 2 POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0)) 3 LINESTRING (1 1, 1.5 1.5) 4 POINT (0 0) dtype: geometry
We can check if each geometry of GeoSeries covers a single geometry:
>>> poly = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]) >>> s.covers(poly) 0 True 1 False 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.covers(s2, align=True) 0 False 1 False 2 False 3 False 4 False dtype: bool
>>> s.covers(s2, align=False) 0 True 1 False 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
covers
any element of the other one.See also
- crosses(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that cross other.An object is said to cross other if its interior intersects the interior of the other but does not contain it, and the dimension of the intersection is less than the dimension of the one or the other.
Note: Unlike Geopandas, Sedona’s implementation always return NULL when GeometryCollection is involved.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 LINESTRING (0 0, 2 2) 2 LINESTRING (2 0, 0 2) 3 POINT (0 1) dtype: geometry >>> s2 1 LINESTRING (1 0, 1 3) 2 LINESTRING (2 0, 0 2) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries crosses a single geometry:
>>> line = LineString([(-1, 1), (3, 1)]) >>> s.crosses(line) 0 True 1 True 2 True 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.crosses(s2, align=True) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.crosses(s2, align=False) 0 True 1 True 2 False 3 False dtype: bool
Notice that a line does not cross a point that it contains.
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
crosses
any element of the other one.See also
- property crs: CRS | None
The Coordinate Reference System (CRS) as a
pyproj.CRS
object.Returns None if the CRS is not set, and to set the value it :getter: Returns a
pyproj.CRS
or None. When setting, the value can be anything accepted bypyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.Note: This assumes all records in the GeoSeries are assumed to have the same CRS.
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoSeries >>> s = GeoSeries([Point(1, 1), Point(2, 2)], crs='EPSG:4326') >>> s.crs <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
See also
GeoSeries.set_crs
assign CRS
GeoSeries.to_crs
re-project to another CRS
- difference(other, align=None) GeoSeries [source]
Returns a
GeoSeries
of the points in each aligned geometry that are not in other.The operation works on a 1-to-1 row-wise manner:
Unlike Geopandas, Sedona does not support this operation for GeometryCollections.
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the difference to.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 6), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s2.difference(point) 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 GEOMETRYCOLLECTION EMPTY dtype: geometry
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.difference(s2, align=True) 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) 5 POINT (0 1) dtype: geometry
>>> s.difference(s2, align=False) 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 GEOMETRYCOLLECTION EMPTY 3 LINESTRING (2 0, 0 2) 4 GEOMETRYCOLLECTION EMPTY dtype: geometry
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is different from any element of the other one.
See also
- distance(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
containing the distance to aligned other.The operation works on a 1-to-1 row-wise manner:
- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the distance to.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Series (float)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 0), (1, 1)]), ... Polygon([(0, 0), (-1, 0), (-1, 1)]), ... LineString([(1, 1), (0, 0)]), ... Point(0, 0), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0.5, 0.5), (1.5, 0.5), (1.5, 1.5), (0.5, 1.5)]), ... Point(3, 1), ... LineString([(1, 0), (2, 0)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 0, 1 1, 0 0)) 1 POLYGON ((0 0, -1 0, -1 1, 0 0)) 2 LINESTRING (1 1, 0 0) 3 POINT (0 0) dtype: geometry
>>> s2 1 POLYGON ((0.5 0.5, 1.5 0.5, 1.5 1.5, 0.5 1.5, ... 2 POINT (3 1) 3 LINESTRING (1 0, 2 0) 4 POINT (0 1) dtype: geometry
We can check the distance of each geometry of GeoSeries to a single geometry:
>>> point = Point(-1, 0) >>> s.distance(point) 0 1.0 1 0.0 2 1.0 3 1.0 dtype: float64
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and use elements with the same index using
align=True
or ignore index and use elements based on their matching order usingalign=False
:>>> s.distance(s2, align=True) 0 NaN 1 0.707107 2 2.000000 3 1.000000 4 NaN dtype: float64
>>> s.distance(s2, align=False) 0 0.000000 1 3.162278 2 0.707107 3 1.000000 dtype: float64
- dwithin(other, distance, align=None)[source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is within a set distance fromother
.The operation works on a 1-to-1 row-wise manner:
- Parameters:
other (GeoSeries or geometric object) – The GeoSeries (elementwise) or geometric object to test for equality.
distance (float, np.array, pd.Series) – Distance(s) to test if each geometry is within. A scalar distance will be applied to all geometries. An array or Series will be applied elementwise. If np.array or pd.Series are used then it must have same length as the GeoSeries.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved. None defaults to True.
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(0, 4), ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(1, 0), (4, 2), (2, 2)]), ... Polygon([(2, 0), (3, 2), (2, 2)]), ... LineString([(2, 0), (2, 2)]), ... Point(1, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 0 2) 2 LINESTRING (0 0, 0 1) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((1 0, 4 2, 2 2, 1 0)) 2 POLYGON ((2 0, 3 2, 2 2, 2 0)) 3 LINESTRING (2 0, 2 2) 4 POINT (1 1) dtype: geometry
We can check if each geometry of GeoSeries contains a single geometry:
>>> point = Point(0, 1) >>> s2.dwithin(point, 1.8) 1 True 2 False 3 False 4 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.dwithin(s2, distance=1, align=True) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.dwithin(s2, distance=1, align=False) 0 True 1 False 2 False 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is within the set distance of any element of the other one.
See also
- property envelope: GeoSeries
Returns a
GeoSeries
of geometries representing the envelope of each geometry.The envelope of a geometry is the bounding rectangle. That is, the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains the geometry.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point, MultiPoint >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... MultiPoint([(0, 0), (1, 1)]), ... Point(0, 0), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 LINESTRING (0 0, 1 1, 1 0) 2 MULTIPOINT ((0 0), (1 1)) 3 POINT (0 0) dtype: geometry
>>> s.envelope 0 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 2 POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0)) 3 POINT (0 0) dtype: geometry
See also
GeoSeries.convex_hull
convex hull geometry
- estimate_utm_crs(datum_name: str = 'WGS 84') CRS [source]
Returns the estimated UTM CRS based on the bounds of the dataset.
- Parameters:
datum_name (str, optional) – The name of the datum to use in the query. Default is WGS 84.
- Return type:
pyproj.CRS
Examples
>>> import geodatasets # requires: pip install geodatasets >>> import geopandas as gpd >>> df = gpd.read_file( ... geodatasets.get_path("geoda.chicago_commpop") ... ) >>> df.geometry.values.estimate_utm_crs() <Derived Projected CRS: EPSG:32616> Name: WGS 84 / UTM zone 16N Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - name: Between 90°W and 84°W, northern hemisphere between equator and 84°N,... - bounds: (-90.0, 0.0, -84.0, 84.0) Coordinate Operation: - name: UTM zone 16N - method: Transverse Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
- property exterior
- fillna(value=None, inplace: bool = False, limit=None, **kwargs) GeoSeries | None [source]
Fill NA values with geometry (or geometries).
- Parameters:
value (shapely geometry or GeoSeries, default None) – If None is passed, NA values will be filled with GEOMETRYCOLLECTION EMPTY. If a shapely geometry object is passed, it will be used to fill all missing values. If a
GeoSeries
is passed, missing values will be filled based on the corresponding index locations. If pd.NA or np.nan are passed, values will be filled withNone
(not GEOMETRYCOLLECTION EMPTY).limit (int, default None) – This is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... None, ... Polygon([(0, 0), (-1, 1), (0, -1)]), ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 None 2 POLYGON ((0 0, -1 1, 0 -1, 0 0)) dtype: geometry
Filled with an empty polygon.
>>> s.fillna() 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 GEOMETRYCOLLECTION EMPTY 2 POLYGON ((0 0, -1 1, 0 -1, 0 0)) dtype: geometry
Filled with a specific polygon.
>>> s.fillna(Polygon([(0, 1), (2, 1), (1, 2)])) 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 1, 2 1, 1 2, 0 1)) 2 POLYGON ((0 0, -1 1, 0 -1, 0 0)) dtype: geometry
Filled with another GeoSeries.
>>> from shapely.geometry import Point >>> s_fill = GeoSeries( ... [ ... Point(0, 0), ... Point(1, 1), ... Point(2, 2), ... ] ... ) >>> s.fillna(s_fill) 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POINT (1 1) 2 POLYGON ((0 0, -1 1, 0 -1, 0 0)) dtype: geometry
See also
GeoSeries.isna
detect missing values
- classmethod from_arrow(arr, **kwargs) GeoSeries [source]
Construct a GeoSeries from a Arrow array object with a GeoArrow extension type.
See https://geoarrow.org/ for details on the GeoArrow specification.
This functions accepts any Arrow array object implementing the Arrow PyCapsule Protocol (i.e. having an
__arrow_c_array__
method).Note: Requires geopandas versions >= 1.0.0 to use with Sedona.
- Parameters:
arr (pyarrow.Array, Arrow array) – Any array object implementing the Arrow PyCapsule Protocol (i.e. has an
__arrow_c_array__
or__arrow_c_stream__
method). The type of the array should be one of the geoarrow geometry types.**kwargs – Other parameters passed to the GeoSeries constructor.
- Return type:
See also
GeoSeries.to_arrow
,GeoDataFrame.from_arrow
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> import geoarrow.pyarrow as ga >>> array = ga.as_geoarrow([None, "POLYGON ((0 0, 1 1, 0 1, 0 0))", "LINESTRING (0 0, -1 1, 0 -1)"]) >>> geoseries = GeoSeries.from_arrow(array) >>> geoseries 0 None 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (0 0, -1 1, 0 -1) dtype: geometry
- classmethod from_file(filename: str, format: str | None = None, **kwargs) GeoSeries [source]
Alternate constructor to create a
GeoDataFrame
from a file.- Parameters:
filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in that directory.
format (str, optional) – The format of the file to read, by default None. If None, Sedona infers the format from the file extension. Note that format inference is not supported for directories. Available formats are “shapefile”, “geojson”, “geopackage”, and “geoparquet”.
table_name (str, optional) – The name of the table to read from a GeoPackage file, by default None. This is required if
format
is “geopackage”.
See also
GeoDataFrame.to_file
Write a
GeoDataFrame
to a file.
- classmethod from_wkb(data, index=None, crs: Any | None = None, on_invalid='raise', **kwargs) GeoSeries [source]
Alternate constructor to create a
GeoSeries
from a list or array of WKB objects- Parameters:
data (array-like or Series) – Series, list or array of WKB objects
index (array-like or Index) – The index for the GeoSeries.
crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.on_invalid ({"raise", "warn", "ignore"}, default "raise") –
raise: an exception will be raised if a WKB input geometry is invalid.
warn: a warning will be raised and invalid WKB geometries will be returned as None.
ignore: invalid WKB geometries will be returned as None without a warning.
fix: an effort is made to fix invalid input geometries (e.g. close unclosed rings). If this is not possible, they are returned as
None
without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.
kwargs – Additional arguments passed to the Series constructor, e.g.
name
.
- Return type:
See also
Examples
>>> wkbs = [ ... ( ... b"\x01\x01\x00\x00\x00\x00\x00\x00\x00" ... b"\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?" ... ), ... ( ... b"\x01\x01\x00\x00\x00\x00\x00\x00\x00" ... b"\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@" ... ), ... ( ... b"\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00" ... b"\x00\x08@\x00\x00\x00\x00\x00\x00\x08@" ... ), ... ] >>> s = GeoSeries.from_wkb(wkbs) >>> s 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: geometry
- classmethod from_wkt(data, index=None, crs: Any | None = None, on_invalid='raise', **kwargs) GeoSeries [source]
Alternate constructor to create a
GeoSeries
from a list or array of WKT objects- Parameters:
data (array-like, Series) – Series, list, or array of WKT objects
index (array-like or Index) – The index for the GeoSeries.
crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.on_invalid ({"raise", "warn", "ignore"}, default "raise") –
raise: an exception will be raised if a WKT input geometry is invalid.
warn: a warning will be raised and invalid WKT geometries will be returned as
None
.ignore: invalid WKT geometries will be returned as
None
without a warning.fix: an effort is made to fix invalid input geometries (e.g. close unclosed rings). If this is not possible, they are returned as
None
without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.
kwargs – Additional arguments passed to the Series constructor, e.g.
name
.
- Return type:
See also
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> wkts = [ ... 'POINT (1 1)', ... 'POINT (2 2)', ... 'POINT (3 3)', ... ] >>> s = GeoSeries.from_wkt(wkts) >>> s 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: geometry
- classmethod from_xy(x, y, z=None, index=None, crs=None, **kwargs) GeoSeries [source]
Alternate constructor to create a
GeoSeries
of Point geometries from lists or arrays of x, y(, z) coordinatesIn case of geographic coordinates, it is assumed that longitude is captured by
x
coordinates and latitude byy
.- Parameters:
x (iterable)
y (iterable)
z (iterable)
index (array-like or Index, optional) – The index for the GeoSeries. If not given and all coordinate inputs are Series with an equal index, that index is used.
crs (value, optional) – Coordinate Reference System of the geometry objects. Can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.**kwargs – Additional arguments passed to the Series constructor, e.g.
name
.
- Return type:
See also
GeoSeries.from_wkt
,points_from_xy
Examples
>>> x = [2.5, 5, -3.0] >>> y = [0.5, 1, 1.5] >>> s = GeoSeries.from_xy(x, y, crs="EPSG:4326") >>> s 0 POINT (2.5 0.5) 1 POINT (5 1) 2 POINT (-3 1.5) dtype: geometry
- property geom_type: pyspark.pandas.Series
Returns a series of strings specifying the geometry type of each geometry of each object.
Note: Unlike Geopandas, Sedona returns LineString instead of LinearRing.
- Returns:
A Series containing the geometry type of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon, Point >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), Point(0, 0)]) >>> gs.geom_type 0 POLYGON 1 POINT dtype: object
- get_geometry(index) GeoSeries [source]
Returns the n-th geometry from a collection of geometries (0-indexed).
If the index is non-negative, it returns the geometry at that index. If the index is negative, it counts backward from the end of the collection (e.g., -1 returns the last geometry). Returns None if the index is out of bounds.
Note: Simple geometries act as length-1 collections
Note: Using Shapely < 2.0, may lead to different results for empty simple geometries due to how shapely interprets them.
- Parameters:
index (int or array_like) – Position of a geometry to be retrieved within its collection
- Return type:
Notes
Simple geometries act as collections of length 1. Any out-of-range index value returns None.
Examples
>>> from shapely.geometry import Point, MultiPoint, GeometryCollection >>> s = geopandas.GeoSeries( ... [ ... Point(0, 0), ... MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), ... GeometryCollection( ... [MultiPoint([(0, 0), (1, 1), (0, 1), (1, 0)]), Point(0, 1)] ... ), ... Polygon(), ... GeometryCollection(), ... ] ... ) >>> s 0 POINT (0 0) 1 MULTIPOINT ((0 0), (1 1), (0 1), (1 0)) 2 GEOMETRYCOLLECTION (MULTIPOINT ((0 0), (1 1), ... 3 POLYGON EMPTY 4 GEOMETRYCOLLECTION EMPTY dtype: geometry
>>> s.get_geometry(0) 0 POINT (0 0) 1 POINT (0 0) 2 MULTIPOINT ((0 0), (1 1), (0 1), (1 0)) 3 POLYGON EMPTY 4 None dtype: geometry
>>> s.get_geometry(1) 0 None 1 POINT (1 1) 2 POINT (0 1) 3 None 4 None dtype: geometry
>>> s.get_geometry(-1) 0 POINT (0 0) 1 POINT (1 0) 2 POINT (0 1) 3 POLYGON EMPTY 4 None dtype: geometry
- property has_sindex
Check the existence of the spatial index without generating it.
Use the .sindex attribute on a GeoDataFrame or GeoSeries to generate a spatial index if it does not yet exist, which may take considerable time based on the underlying index implementation.
Note that the underlying spatial index may not be fully initialized until the first use.
Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.
Examples
>>> from shapely.geometry import Point >>> s = GeoSeries([Point(x, x) for x in range(5)]) >>> s.has_sindex False >>> index = s.sindex >>> s.has_sindex True
- Returns:
True if the spatial index has been generated or False if not.
- Return type:
- property has_z: pyspark.pandas.Series
Returns a
Series
ofdtype('bool')
with valueTrue
for features that have a z-component.Notes
Every operation in GeoPandas is planar, i.e. the potential third dimension is not taken into account.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries( ... [ ... Point(0, 1), ... Point(0, 1, 2), ... ] ... ) >>> s 0 POINT (0 1) 1 POINT Z (0 1 2) dtype: geometry
>>> s.has_z 0 False 1 True dtype: bool
- property interiors
- intersection(other: GeoSeries | BaseGeometry, align: bool | None = None) GeoSeries [source]
Returns a
GeoSeries
of the intersection of points in each aligned geometry with other.The operation works on a 1-to-1 row-wise manner.
Note: Unlike most functions, intersection may return the unordered with respect to the index. If this is important to you, you may call
sort_index()
on the result.- Parameters:
other (Geoseries or geometric object) – The Geoseries (elementwise) or geometric object to find the intersection with.
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 6), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 LINESTRING (2 0, 0 2) 4 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (1 0, 1 3) 3 LINESTRING (2 0, 0 2) 4 POINT (1 1) 5 POINT (0 1) dtype: geometry
We can also do intersection of each geometry and a single shapely geometry:
>>> s.intersection(Polygon([(0, 0), (1, 1), (0, 1)])) 0 POLYGON ((0 0, 0 1, 1 1, 0 0)) 1 POLYGON ((0 0, 0 1, 1 1, 0 0)) 2 LINESTRING (0 0, 1 1) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.intersection(s2, align=True) 0 None 1 POLYGON ((0 0, 0 1, 1 1, 0 0)) 2 POINT (1 1) 3 LINESTRING (2 0, 0 2) 4 POINT EMPTY 5 None dtype: geometry
>>> s.intersection(s2, align=False) 0 POLYGON ((0 0, 0 1, 1 1, 0 0)) 1 LINESTRING (1 1, 1 2) 2 POINT (1 1) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
See also
GeoSeries.difference
,GeoSeries.symmetric_difference
,GeoSeries.union
- intersects(other: GeoSeries | BaseGeometry, align: bool | None = None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that intersects other.An object is said to intersect other if its boundary and interior intersects in any way with those of the other.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... LineString([(2, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... LineString([(1, 0), (1, 3)]), ... LineString([(2, 0), (0, 2)]), ... Point(1, 1), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 LINESTRING (0 0, 2 2) 2 LINESTRING (2 0, 0 2) 3 POINT (0 1) dtype: geometry
>>> s2 1 LINESTRING (1 0, 1 3) 2 LINESTRING (2 0, 0 2) 3 POINT (1 1) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries crosses a single geometry:
>>> line = LineString([(-1, 1), (3, 1)]) >>> s.intersects(line) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.intersects(s2, align=True) 0 False 1 True 2 True 3 False 4 False dtype: bool
>>> s.intersects(s2, align=False) 0 True 1 True 2 True 3 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
crosses
any element of the other one.
- property is_ccw
- property is_closed
- property is_empty: pyspark.pandas.Series
Returns a
Series
ofdtype('bool')
with valueTrue
for empty geometries.Examples
An example of a GeoDataFrame with one empty point, one point and one missing value:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> geoseries = GeoSeries([Point(), Point(2, 1), None], crs="EPSG:4326") >>> geoseries 0 POINT EMPTY 1 POINT (2 1) 2 None
>>> geoseries.is_empty 0 True 1 False 2 False dtype: bool
See also
GeoSeries.isna
detect missing geometries
- property is_ring
Return a
Series
ofdtype('bool')
with valueTrue
for features that are closed.When constructing a LinearRing, the sequence of coordinates may be explicitly closed by passing identical values in the first and last indices. Otherwise, the sequence will be implicitly closed by copying the first tuple to the last index.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import LineString, LinearRing >>> s = GeoSeries( ... [ ... LineString([(0, 0), (1, 1), (1, -1)]), ... LineString([(0, 0), (1, 1), (1, -1), (0, 0)]), ... LinearRing([(0, 0), (1, 1), (1, -1)]), ... ] ... ) >>> s 0 LINESTRING (0 0, 1 1, 1 -1) 1 LINESTRING (0 0, 1 1, 1 -1, 0 0) 2 LINEARRING (0 0, 1 1, 1 -1, 0 0) dtype: geometry
>>> s.is_ring 0 False 1 True 2 True dtype: bool
- property is_simple: pyspark.pandas.Series
Returns a
Series
ofdtype('bool')
with valueTrue
for geometries that do not cross themselves.This is meaningful only for LineStrings and LinearRings.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import LineString >>> s = GeoSeries( ... [ ... LineString([(0, 0), (1, 1), (1, -1), (0, 1)]), ... LineString([(0, 0), (1, 1), (1, -1)]), ... ] ... ) >>> s 0 LINESTRING (0 0, 1 1, 1 -1, 0 1) 1 LINESTRING (0 0, 1 1, 1 -1) dtype: geometry
>>> s.is_simple 0 False 1 True dtype: bool
- property is_valid: pyspark.pandas.Series
Returns a
Series
ofdtype('bool')
with valueTrue
for geometries that are valid.Examples
An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... Polygon([(0,0), (1, 1), (1, 0), (0, 1)]), # bowtie geometry ... Polygon([(0, 0), (2, 2), (2, 0)]), ... None ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0)) 2 POLYGON ((0 0, 2 2, 2 0, 0 0)) 3 None dtype: geometry
>>> s.is_valid 0 True 1 False 2 True 3 False dtype: bool
See also
GeoSeries.is_valid_reason
reason for invalidity
- is_valid_reason() pyspark.pandas.Series [source]
Returns a
Series
of strings with the reason for invalidity of each geometry.Examples
An example with one invalid polygon (a bowtie geometry crossing itself) and one missing geometry:
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... Polygon([(0,0), (1, 1), (1, 0), (0, 1)]), # bowtie geometry ... Polygon([(0, 0), (2, 2), (2, 0)]), ... Polygon([(0, 0), (2, 0), (1, 1), (2, 2), (0, 2), (1, 1), (0, 0)]), ... None ... ] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 POLYGON ((0 0, 1 1, 1 0, 0 1, 0 0)) 2 POLYGON ((0 0, 2 2, 2 0, 0 0)) 3 None dtype: geometry
>>> s.is_valid_reason() 0 Valid Geometry 1 Self-intersection at or near point (0.5, 0.5, NaN) 2 Valid Geometry 3 Ring Self-intersection at or near point (1.0, 1.0) 4 None dtype: object
See also
GeoSeries.is_valid
detect invalid geometries
GeoSeries.make_valid
fix invalid geometries
- isna() pyspark.pandas.Series [source]
Detect missing values.
- Returns:
A boolean Series of the same size as the GeoSeries,
True where a value is NA.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [Polygon([(0, 0), (1, 1), (0, 1)]), None, Polygon([])] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 None 2 POLYGON EMPTY dtype: geometry
>>> s.isna() 0 False 1 True 2 False dtype: bool
See also
GeoSeries.notna
inverse of isna
GeoSeries.is_empty
detect empty geometries
- isnull() pyspark.pandas.Series [source]
Alias for isna method. See isna for more detail.
- property length: pyspark.pandas.Series
Returns a Series containing the length of each geometry in the GeoSeries.
In the case of a (Multi)Polygon it measures the length of its exterior (i.e. perimeter).
For a GeometryCollection it measures sums the values for each of the individual geometries.
- Returns:
A Series containing the length of each geometry.
- Return type:
Series
Examples
>>> from shapely.geometry import Polygon >>> from sedona.spark.geopandas import GeoSeries
>>> gs = GeoSeries([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)]), GeometryCollection([Point(0, 0), LineString([(0, 0), (1, 1)]), Polygon([(0, 0), (1, 0), (1, 1)])])]) >>> gs.length 0 0.000000 1 1.414214 2 3.414214 3 4.828427 dtype: float64
- property m: pyspark.pandas.Series
- make_valid(*, method='linework', keep_collapsed=True) GeoSeries [source]
Repairs invalid geometries.
Returns a
GeoSeries
with valid geometries.If the input geometry is already valid, then it will be preserved. In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same type to be made valid, then a multi-part geometry will be returned (e.g. a MultiPolygon). If the geometry must be split into multiple parts of different types to be made valid, then a GeometryCollection will be returned.
In Sedona, only the ‘structure’ method is available:
the ‘structure’ algorithm tries to reason from the structure of the input to find the ‘correct’ repair: exterior rings bound area, interior holes exclude area. It first makes all rings valid, then shells are merged and holes are subtracted from the shells to generate valid result. It assumes that holes and shells are correctly categorized in the input geometry.
- Parameters:
method ({'linework', 'structure'}, default 'linework') – Algorithm to use when repairing geometry. Sedona Geopandas only supports the ‘structure’ method. The default method is “linework” to match compatibility with Geopandas, but it must be explicitly set to ‘structure’ to use the Sedona implementation.
keep_collapsed (bool, default True) – For the ‘structure’ method, True will keep components that have collapsed into a lower dimensionality. For example, a ring collapsing to a line, or a line collapsing to a point.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import MultiPolygon, Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (0, 2), (1, 1), (2, 2), (2, 0), (1, 1), (0, 0)]), ... Polygon([(0, 2), (0, 1), (2, 0), (0, 0), (0, 2)]), ... LineString([(0, 0), (1, 1), (1, 0)]), ... ], ... ) >>> s 0 POLYGON ((0 0, 0 2, 1 1, 2 2, 2 0, 1 1, 0 0)) 1 POLYGON ((0 2, 0 1, 2 0, 0 0, 0 2)) 2 LINESTRING (0 0, 1 1, 1 0) dtype: geometry
>>> s.make_valid() 0 MULTIPOLYGON (((1 1, 0 0, 0 2, 1 1)), ((2 0, 1... 1 POLYGON ((0 1, 2 0, 0 0, 0 1)) 2 LINESTRING (0 0, 1 1, 1 0) dtype: geometry
- notna() pyspark.pandas.Series [source]
Detect non-missing values.
- Returns:
A boolean pandas Series of the same size as the GeoSeries,
False where a value is NA.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon >>> s = GeoSeries( ... [Polygon([(0, 0), (1, 1), (0, 1)]), None, Polygon([])] ... ) >>> s 0 POLYGON ((0 0, 1 1, 0 1, 0 0)) 1 None 2 POLYGON EMPTY dtype: geometry
>>> s.notna() 0 True 1 False 2 True dtype: bool
See also
GeoSeries.isna
inverse of notna
GeoSeries.is_empty
detect empty geometries
- notnull() pyspark.pandas.Series [source]
Alias for notna method. See notna for more detail.
- overlaps(other, align=None) pyspark.pandas.Series [source]
Returns True for all aligned geometries that overlap other, else False.
In the original Geopandas, Geometries overlap if they have more than one but not all points in common, have the same dimension, and the intersection of the interiors of the geometries has the same dimension as the geometries themselves.
However, in Sedona, we return True in the case where the geometries points match.
Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString, MultiPoint, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... MultiPoint([(0, 0), (0, 1)]), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (2, 0), (0, 2)]), ... LineString([(0, 1), (1, 1)]), ... LineString([(1, 1), (3, 3)]), ... Point(0, 1), ... ], ... )
We can check if each geometry of GeoSeries overlaps a single geometry:
>>> polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]) >>> s.overlaps(polygon) 0 True 1 True 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We align both GeoSeries based on index values and compare elements with the same index.
>>> s.overlaps(s2) 0 False 1 True 2 False 3 False 4 False dtype: bool
>>> s.overlaps(s2, align=False) 0 True 1 False 2 True 3 False dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
overlaps
any element of the other one.See also
- plot(*args, **kwargs)[source]
Plot a GeoSeries.
Generate a plot of a GeoSeries geometry with matplotlib.
Note: This method is not scalable and requires collecting all data to the driver.
- Parameters:
s (Series) – The GeoSeries to be plotted. Currently Polygon, MultiPolygon, LineString, MultiLineString, Point and MultiPoint geometries can be plotted.
cmap (str (default None)) –
The name of a colormap recognized by matplotlib. Any colormap will work, but categorical colormaps are generally recommended. Examples of useful discrete colormaps include:
tab10, tab20, Accent, Dark2, Paired, Pastel1, Set1, Set2
color (str, np.array, pd.Series, List (default None)) – If specified, all objects will be colored uniformly.
ax (matplotlib.pyplot.Artist (default None)) – axes on which to draw the plot
figsize (pair of floats (default None)) – Size of the resulting matplotlib.figure.Figure. If the argument ax is given explicitly, figsize is ignored.
aspect ('auto', 'equal', None or float (default 'auto')) – Set aspect of axis. If ‘auto’, the default aspect for map plots is ‘equal’; if however data are not projected (coordinates are long/lat), the aspect is by default set to 1/cos(s_y * pi/180) with s_y the y coordinate of the middle of the GeoSeries (the mean of the y range of bounding box) so that a long/lat square appears square in the middle of the plot. This implies an Equirectangular projection. If None, the aspect of ax won’t be changed. It can also be set manually (float) as the ratio of y-unit to x-unit.
autolim (bool (default True)) – Update axes data limits to contain the new geometries.
**style_kwds (dict) – Color options to be passed on to the actual plot function, such as
edgecolor
,facecolor
,linewidth
,markersize
,alpha
.
- Returns:
ax
- Return type:
matplotlib axes instance
- segmentize(max_segment_length)[source]
Returns a
GeoSeries
with vertices added to line segments based on maximum segment length.Additional vertices will be added to every line segment in an input geometry so that segments are no longer than the provided maximum segment length. New vertices will evenly subdivide each segment. Only linear components of input geometries are densified; other geometries are returned unmodified.
- Parameters:
max_segment_length (float | array-like) – Additional vertices will be added so that all line segments are no longer than this value. Must be greater than 0.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Polygon, LineString >>> s = GeoSeries( ... [ ... LineString([(0, 0), (0, 10)]), ... Polygon([(0, 0), (10, 0), (10, 10), (0, 10), (0, 0)]), ... ], ... ) >>> s 0 LINESTRING (0 0, 0 10) 1 POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0)) dtype: geometry
>>> s.segmentize(max_segment_length=5) 0 LINESTRING (0 0, 0 5, 0 10) 1 POLYGON ((0 0, 5 0, 10 0, 10 5, 10 10, 5 10, 0... dtype: geometry
- set_crs(crs: Any | None = None, epsg: int | None = None, inplace: Literal[True] = True, allow_override: bool = False) None [source]
- set_crs(crs: Any | None = None, epsg: int | None = None, inplace: Literal[False] = False, allow_override: bool = False) GeoSeries
Set the Coordinate Reference System (CRS) of a
GeoSeries
.Pass
None
to remove CRS from theGeoSeries
.Notes
The underlying geometries are not transformed to this CRS. To transform the geometries to a new CRS, use the
to_crs
method.- Parameters:
crs (pyproj.CRS | None, optional) – The value can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.epsg (int, optional if crs is specified) – EPSG code specifying the projection.
inplace (bool, default False) – If True, the CRS of the GeoSeries will be changed in place (while still returning the result) instead of making a copy of the GeoSeries.
allow_override (bool, default True) – If the GeoSeries already has a CRS, allow to replace the existing CRS, even when both are not equal. In Sedona, setting this to True will lead to eager evaluation instead of lazy evaluation. Unlike Geopandas, True is the default value in Sedona for performance reasons.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: geometry
Setting CRS to a GeoSeries without one:
>>> s.crs is None True
>>> s = s.set_crs('epsg:3857') >>> s.crs <Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World - 85°S to 85°N - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Overriding existing CRS:
>>> s = s.set_crs(4326, allow_override=True)
Without
allow_override=True
,set_crs
returns an error if you try to override CRS.See also
GeoSeries.to_crs
re-project to another CRS
- simplify(tolerance=None, preserve_topology=True) GeoSeries [source]
Returns a
GeoSeries
containing a simplified representation of each geometry.The algorithm (Douglas-Peucker) recursively splits the original line into smaller parts and connects these parts’ endpoints by a straight line. Then, it removes all points whose distance to the straight line is smaller than tolerance. It does not move any points and it always preserves endpoints of the original line or polygon. See https://shapely.readthedocs.io/en/latest/manual.html#object.simplify for details
Simplifies individual geometries independently, without considering the topology of a potential polygonal coverage. If you would like to treat the
GeoSeries
as a coverage and simplify its edges, while preserving the coverage topology, seesimplify_coverage()
.- Parameters:
tolerance (float) – All parts of a simplified geometry will be no more than tolerance distance from the original. It has the same units as the coordinate reference system of the GeoSeries. For example, using tolerance=100 in a projected CRS with meters as units means a distance of 100 meters in reality.
preserve_topology (bool (default True)) – False uses a quicker algorithm, but may produce self-intersecting or otherwise invalid geometries.
Notes
Invalid geometric objects may result from simplification that does not preserve topology and simplification may be sensitive to the order of coordinates: two geometries differing only in order of coordinates may be simplified differently.
See also
simplify_coverage
simplify geometries using coverage simplification
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point, LineString >>> s = GeoSeries( ... [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])] ... ) >>> s 0 POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19... 1 LINESTRING (0 0, 1 10, 0 20) dtype: geometry
>>> s.simplify(1) 0 POLYGON ((0 1, 0 -1, -1 0, 0 1)) 1 LINESTRING (0 0, 0 20) dtype: geometry
- property sindex: SpatialIndex
Returns a spatial index for the GeoSeries.
Note that the spatial index may not be fully initialized until the first use.
Currently, sindex is not retained when calling this method from a GeoDataFrame. You can workaround this by first extracting the active geometry column as a GeoSeries, and calling this method.
- Returns:
The spatial index.
- Return type:
Examples
>>> from shapely.geometry import Point, box >>> from sedona.spark.geopandas import GeoSeries >>> >>> s = GeoSeries([Point(x, x) for x in range(5)]) >>> s.sindex.query(box(1, 1, 3, 3)) [Point(1, 1), Point(2, 2), Point(3, 3)] >>> s.has_sindex True
- snap(other, tolerance, align=None) GeoSeries [source]
Snap the vertices and segments of the geometry to vertices of the reference.
Vertices and segments of the input geometry are snapped to vertices of the reference geometry, returning a new geometry; the input geometries are not modified. The result geometry is the input geometry with the vertices and segments snapped. If no snapping occurs then the input geometry is returned unchanged. The tolerance is used to control where snapping is performed.
Where possible, this operation tries to avoid creating invalid geometries; however, it does not guarantee that output geometries will be valid. It is the responsibility of the caller to check for and handle invalid geometries.
Because too much snapping can result in invalid geometries being created, heuristics are used to determine the number and location of snapped vertices that are likely safe to snap. These heuristics may omit some potential snaps that are otherwise within the tolerance.
Note: Sedona’s result may differ slightly from geopandas’s snap() result because of small differences between the underlying engines being used.
The operation works in a 1-to-1 row-wise manner:
- Parameters:
other (GeoSeries or geometric object) – The Geoseries (elementwise) or geometric object to snap to.
tolerance (float or array like) – Maximum distance between vertices that shall be snapped
align (bool | None (default None)) – If True, automatically aligns GeoSeries based on their indices. None defaults to True. If False, the order of elements is preserved.
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Point(0.5, 2.5), ... LineString([(0.1, 0.1), (0.49, 0.51), (1.01, 0.89)]), ... Polygon([(0, 0), (0, 10), (10, 10), (10, 0), (0, 0)]), ... ], ... ) >>> s 0 POINT (0.5 2.5) 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0)) dtype: geometry
>>> s2 = GeoSeries( ... [ ... Point(0, 2), ... LineString([(0, 0), (0.5, 0.5), (1.0, 1.0)]), ... Point(8, 10), ... ], ... index=range(1, 4), ... ) >>> s2 1 POINT (0 2) 2 LINESTRING (0 0, 0.5 0.5, 1 1) 3 POINT (8 10) dtype: geometry
We can snap each geometry to a single shapely geometry:
>>> s.snap(Point(0, 2), tolerance=1) 0 POINT (0 2) 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0 0, 0 2, 0 10, 10 10, 10 0, 0 0)) dtype: geometry
We can also snap two GeoSeries to each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and snap elements with the same index using
align=True
or ignore index and snap elements based on their matching order usingalign=False
:>>> s.snap(s2, tolerance=1, align=True) 0 None 1 LINESTRING (0.1 0.1, 0.49 0.51, 1.01 0.89) 2 POLYGON ((0.5 0.5, 1 1, 0 10, 10 10, 10 0, 0.5... 3 None dtype: geometry
>>> s.snap(s2, tolerance=1, align=False) 0 POINT (0 2) 1 LINESTRING (0 0, 0.5 0.5, 1 1) 2 POLYGON ((0 0, 0 10, 8 10, 10 10, 10 0, 0 0)) dtype: geometry
- to_arrow(geometry_encoding='WKB', interleaved=True, include_z=None)[source]
Encode a GeoSeries to GeoArrow format.
See https://geoarrow.org/ for details on the GeoArrow specification.
This functions returns a generic Arrow array object implementing the Arrow PyCapsule Protocol (i.e. having an
__arrow_c_array__
method). This object can then be consumed by your Arrow implementation of choice that supports this protocol.Note: Requires geopandas versions >= 1.0.0 to use with Sedona.
- Parameters:
geometry_encoding ({'WKB', 'geoarrow' }, default 'WKB') – The GeoArrow encoding to use for the data conversion.
interleaved (bool, default True) – Only relevant for ‘geoarrow’ encoding. If True, the geometries’ coordinates are interleaved in a single fixed size list array. If False, the coordinates are stored as separate arrays in a struct type.
include_z (bool, default None) – Only relevant for ‘geoarrow’ encoding (for WKB, the dimensionality of the individual geometries is preserved). If False, return 2D geometries. If True, include the third dimension in the output (if a geometry has no third dimension, the z-coordinates will be NaN). By default, will infer the dimensionality from the input geometries. Note that this inference can be unreliable with empty geometries (for a guaranteed result, it is recommended to specify the keyword).
- Returns:
A generic Arrow array object with geometry data encoded to GeoArrow.
- Return type:
GeoArrowArray
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> gser = GeoSeries([Point(1, 2), Point(2, 1)]) >>> gser 0 POINT (1 2) 1 POINT (2 1) dtype: geometry
>>> arrow_array = gser.to_arrow() >>> arrow_array <geopandas.io._geoarrow.GeoArrowArray object at ...>
The returned array object needs to be consumed by a library implementing the Arrow PyCapsule Protocol. For example, wrapping the data as a pyarrow.Array (requires pyarrow >= 14.0):
>>> import pyarrow as pa >>> array = pa.array(arrow_array) >>> array <pyarrow.lib.BinaryArray object at ...> [ 0101000000000000000000F03F0000000000000040, 01010000000000000000000040000000000000F03F ]
- to_crs(crs: Any | None = None, epsg: int | None = None) GeoSeries [source]
Returns a
GeoSeries
with all geometries transformed to a new coordinate reference system.Transform all geometries in a GeoSeries to a different coordinate reference system. The
crs
attribute on the current GeoSeries must be set. Eithercrs
orepsg
may be specified for output.This method will transform all points in all objects. It has no notion of projecting entire geometries. All segments joining points are assumed to be lines in the current projection, not geodesics. Objects crossing the dateline (or other projection boundary) will have undesirable behavior.
- Parameters:
crs (pyproj.CRS, optional if epsg is specified) – The value can be anything accepted by
pyproj.CRS.from_user_input()
, such as an authority string (eg “EPSG:4326”) or a WKT string.epsg (int, optional if crs is specified) – EPSG code specifying output projection.
- Return type:
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoSeries >>> geoseries = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)], crs=4326) >>> geoseries.crs <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
>>> geoseries = geoseries.to_crs(3857) >>> print(geoseries) 0 POINT (111319.491 111325.143) 1 POINT (222638.982 222684.209) 2 POINT (333958.472 334111.171) dtype: geometry >>> geoseries.crs <Projected CRS: EPSG:3857> Name: WGS 84 / Pseudo-Mercator Axis Info [cartesian]: - X[east]: Easting (metre) - Y[north]: Northing (metre) Area of Use: - name: World - 85°S to 85°N - bounds: (-180.0, -85.06, 180.0, 85.06) Coordinate Operation: - name: Popular Visualisation Pseudo-Mercator - method: Popular Visualisation Pseudo Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
- to_file(path: str, driver: str | None = None, schema: dict | None = None, index: bool | None = None, **kwargs)[source]
Write the
GeoSeries
to a file.- Parameters:
path (str) – File path or file handle to write to.
driver (str, optional) – The format driver used to write the file, by default None. If not specified, it’s inferred from the file extension. Available formats are “geojson”, “geopackage”, and “geoparquet”.
index (bool, optional) – If True, writes the index as a column. If False, no index is written. By default None, the index is written only if it is named, is a MultiIndex, or has a non-integer data type.
mode (str, default 'w') – The write mode: ‘w’ to overwrite the existing file or ‘a’ to append.
crs (pyproj.CRS, optional) – The coordinate reference system to write. If None, it is determined from the
GeoSeries
crs attribute. The value can be anything accepted bypyproj.CRS.from_user_input()
, such as an authority string (e.g., “EPSG:4326”) or a WKT string.**kwargs – Additional keyword arguments passed to the underlying writing engine.
Examples
>>> from shapely.geometry import Point, LineString >>> from sedona.spark.geopandas import GeoSeries >>> # Note: Examples write to temporary files for demonstration >>> import tempfile >>> import os
Create a GeoSeries: >>> gs = GeoSeries( … [Point(0, 0), LineString([(1, 1), (2, 2)])], … index=[“a”, “b”] … )
Save to a GeoParquet file: >>> path_parquet = os.path.join(tempfile.gettempdir(), “data.parquet”) >>> gs.to_file(path_parquet, driver=”geoparquet”)
Append to a GeoJSON file: >>> path_json = os.path.join(tempfile.gettempdir(), “data.json”) >>> gs.to_file(path_json, driver=”geojson”, mode=’a’)
- to_geopandas() GeoSeries [source]
Convert the GeoSeries to a geopandas GeoSeries.
Returns: - geopandas.GeoSeries: A geopandas GeoSeries.
- to_json(show_bbox: bool = True, drop_id: bool = False, to_wgs84: bool = False, **kwargs) str [source]
Returns a GeoJSON string representation of the GeoSeries.
- Parameters:
show_bbox (bool, optional, default: True) – Include bbox (bounds) in the geojson
drop_id (bool, default: False) – Whether to retain the index of the GeoSeries as the id property in the generated GeoJSON. Default is False, but may want True if the index is just arbitrary row numbers.
to_wgs84 (bool, optional, default: False) –
If the CRS is set on the active geometry column it is exported as WGS84 (EPSG:4326) to meet the 2016 GeoJSON specification. Set to True to force re-projection and set to False to ignore CRS. False by default.
json.dumps(). (*kwargs* that will be passed to)
Note (Unlike geopandas, Sedona's implementation will replace 'LinearRing')
output. (with 'LineString' in the GeoJSON)
- Return type:
JSON string
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: geometry
>>> s.to_json() '{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [1.0, 1.0]}, "bbox": [1.0, 1.0, 1.0, 1.0]}, {"id": "1", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [2.0, 2.0]}, "bbox": [2.0, 2.0, 2.0, 2.0]}, {"id": "2", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [3.0, 3.0]}, "bbox": [3.0, 3.0, 3.0, 3.0]}], "bbox": [1.0, 1.0, 3.0, 3.0]}'
See also
GeoSeries.to_file
write GeoSeries to file
- to_parquet(path, **kwargs)[source]
Write the GeoSeries to a GeoParquet file.
- Parameters:
path (str) – The file path where the GeoParquet file will be written.
**kwargs – Additional keyword arguments passed to the underlying writing function.
- Return type:
None
Examples
>>> from shapely.geometry import Point >>> from sedona.spark.geopandas import GeoSeries >>> import tempfile >>> import os >>> gs = GeoSeries([Point(1, 1), Point(2, 2)]) >>> file_path = os.path.join(tempfile.gettempdir(), "my_geodata.parquet") >>> gs.to_parquet(file_path)
- to_spark_pandas() pyspark.pandas.Series [source]
- to_wkb(hex: bool = False, **kwargs) pyspark.pandas.Series [source]
Convert GeoSeries geometries to WKB
- Parameters:
hex (bool) – If true, export the WKB as a hexadecimal string. The default is to return a binary bytes object.
kwargs – Additional keyword args will be passed to
shapely.to_wkb()
.
- Returns:
WKB representations of the geometries
- Return type:
Series
See also
Examples
>>> from shapely.geometry import Point, Polygon >>> s = GeoSeries( ... [ ... Point(0, 0), ... Polygon(), ... Polygon([(0, 0), (1, 1), (1, 0)]), ... None, ... ] ... )
>>> s.to_wkb() 0 b'... 1 b'' 2 b'... 3 None dtype: object
>>> s.to_wkb(hex=True) 0 010100000000000000000000000000000000000000 1 010300000000000000 2 0103000000010000000400000000000000000000000000... 3 None dtype: object
- to_wkt(**kwargs) pyspark.pandas.Series [source]
Convert GeoSeries geometries to WKT
Note: Using shapely < 1.0.0 may return different geometries for empty geometries.
- Parameters:
kwargs – Keyword args will be passed to
shapely.to_wkt()
.- Returns:
WKT representations of the geometries
- Return type:
Series
Examples
>>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: geometry
>>> s.to_wkt() 0 POINT (1 1) 1 POINT (2 2) 2 POINT (3 3) dtype: object
See also
- property total_bounds
Returns a tuple containing
minx
,miny
,maxx
,maxy
values for the bounds of the series as a whole.See
GeoSeries.bounds
for the bounds of the geometries contained in the series.Examples
>>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.total_bounds array([ 0., -1., 3., 2.])
- touches(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that touches other.An object is said to touch other if it has at least one point in common with other and its interior does not intersect with any part of the other. Overlapping features therefore do not touch.
Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, MultiPoint, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (2, 2), (0, 2)]), ... LineString([(0, 0), (2, 2)]), ... MultiPoint([(0, 0), (0, 1)]), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (-2, 0), (0, -2)]), ... LineString([(0, 1), (1, 1)]), ... LineString([(1, 1), (3, 0)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 2 2, 0 2, 0 0)) 2 LINESTRING (0 0, 2 2) 3 MULTIPOINT ((0 0), (0 1)) dtype: geometry
>>> s2 1 POLYGON ((0 0, -2 0, 0 -2, 0 0)) 2 LINESTRING (0 1, 1 1) 3 LINESTRING (1 1, 3 0) 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries touches a single geometry:
>>> line = LineString([(0, 0), (-1, -2)]) >>> s.touches(line) 0 True 1 True 2 True 3 True dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s.touches(s2, align=True) 0 False 1 True 2 True 3 False 4 False dtype: bool
>>> s.touches(s2, align=False) 0 True 1 False 2 True 3 False dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries
touches
any element of the other one.See also
- property type
- property unary_union
- union_all(method='unary', grid_size=None) BaseGeometry [source]
Returns a geometry containing the union of all geometries in the
GeoSeries
.Sedona does not support the method or grid_size argument, so the user does not need to manually decide the algorithm being used.
- Parameters:
method (str (default
"unary"
)) – Not supported in Sedona.grid_size (float, default None) – Not supported in Sedona.
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import box >>> s = GeoSeries([box(0, 0, 1, 1), box(0, 0, 2, 2)]) >>> s 0 POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0)) 1 POLYGON ((2 0, 2 2, 0 2, 0 0, 2 0)) dtype: geometry
>>> s.union_all() <POLYGON ((0 1, 0 2, 2 2, 2 0, 1 0, 0 0, 0 1))>
- within(other, align=None) pyspark.pandas.Series [source]
Returns a
Series
ofdtype('bool')
with valueTrue
for each aligned geometry that is within other.An object is said to be within other if at least one of its points is located in the interior and no points are located in the exterior of the other. If either object is empty, this operation returns
False
.This is the inverse of contains in the sense that the expression
a.within(b) == b.contains(a)
always evaluates toTrue
.Note: Sedona’s behavior may also differ from Geopandas for GeometryCollections and for geometries that are equal.
The operation works on a 1-to-1 row-wise manner.
- Parameters:
- Return type:
Series (bool)
Examples
>>> from shapely.geometry import Polygon, LineString, Point >>> s = GeoSeries( ... [ ... Polygon([(0, 0), (2, 2), (0, 2)]), ... Polygon([(0, 0), (1, 2), (0, 2)]), ... LineString([(0, 0), (0, 2)]), ... Point(0, 1), ... ], ... ) >>> s2 = GeoSeries( ... [ ... Polygon([(0, 0), (1, 1), (0, 1)]), ... LineString([(0, 0), (0, 2)]), ... LineString([(0, 0), (0, 1)]), ... Point(0, 1), ... ], ... index=range(1, 5), ... )
>>> s 0 POLYGON ((0 0, 2 2, 0 2, 0 0)) 1 POLYGON ((0 0, 1 2, 0 2, 0 0)) 2 LINESTRING (0 0, 0 2) 3 POINT (0 1) dtype: geometry
>>> s2 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) 2 LINESTRING (0 0, 0 2) 3 LINESTRING (0 0, 0 1)] 4 POINT (0 1) dtype: geometry
We can check if each geometry of GeoSeries is within a single geometry:
>>> polygon = Polygon([(0, 0), (2, 2), (0, 2)]) >>> s.within(polygon) 0 True 1 True 2 False 3 False dtype: bool
We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using
align=True
or ignore index and compare elements based on their matching order usingalign=False
:>>> s2.within(s) 0 False 1 False 2 True 3 False 4 False dtype: bool
>>> s2.within(s, align=False) 1 True 2 False 3 True 4 True dtype: bool
Notes
This method works in a row-wise manner. It does not check if an element of one GeoSeries is
within
any element of the other one.See also
- property x: pyspark.pandas.Series
Return the x location of point geometries in a GeoSeries
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s.x 0 1.0 1 2.0 2 3.0 dtype: float64
See also
- property y: pyspark.pandas.Series
Return the y location of point geometries in a GeoSeries
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1), Point(2, 2), Point(3, 3)]) >>> s.y 0 1.0 1 2.0 2 3.0 dtype: float64
See also
- property z: pyspark.pandas.Series
Return the z location of point geometries in a GeoSeries
- Return type:
Examples
>>> from sedona.spark.geopandas import GeoSeries >>> from shapely.geometry import Point >>> s = GeoSeries([Point(1, 1, 1), Point(2, 2, 2), Point(3, 3, 3)]) >>> s.z 0 1.0 1 2.0 2 3.0 dtype: float64
See also
sedona.spark.geopandas.io module
- sedona.spark.geopandas.io.read_file(filename: str, format: str | None = None, **kwargs)[source]
Alternate constructor to create a
GeoDataFrame
from a file.- Parameters:
filename (str) – File path or file handle to read from. If the path is a directory, Sedona will read all files in the directory into a dataframe.
format (str, default None) –
The format of the file to read. If None, Sedona will infer the format from the file extension. Note, inferring the format from the file extension is not supported for directories. Options:
”shapefile”
”geojson”
”geopackage”
”geoparquet”
See also
GeoDataFrame.to_file
write GeoDataFrame to file
- sedona.spark.geopandas.io.read_parquet(path, columns=None, storage_options=None, bbox=None, to_pandas_kwargs=None, **kwargs)[source]
Load a Parquet object from the file path, returning a GeoDataFrame.
if no geometry columns are read, this will raise a
ValueError
- you should use the pandas read_parquet method instead.
If ‘crs’ key is not present in the GeoParquet metadata associated with the Parquet object, it will default to “OGC:CRS84” according to the specification.
- Parameters:
path (str, path object)
columns (list-like of strings, default=None) – Not currently supported in Sedona
storage_options (dict, optional) – Not currently supported in Sedona
bbox (tuple, optional) – Not currently supported in Sedona
to_pandas_kwargs (dict, optional) – Not currently supported in Sedona
- Return type:
Examples
from sedona.spark.geopandas import read_parquet >>> df = read_parquet(“data.parquet”) # doctest: +SKIP
Specifying columns to read:
>>> df = read_parquet( ... "data.parquet", ... )
sedona.spark.geopandas.sindex module
- class sedona.spark.geopandas.sindex.SpatialIndex(geometry, index_type='strtree', column_name=None)[source]
Bases:
object
A wrapper around Sedona’s spatial index functionality.
- __init__(geometry, index_type='strtree', column_name=None)[source]
Initialize the SpatialIndex with geometry data.
- property is_empty
Check if the spatial index is empty.
- Returns:
True if the index is empty, False otherwise.
- Return type:
- nearest(geometry, k=1, return_distance=False)[source]
Find the nearest geometry in the spatial index.
- Parameters:
- Returns:
List of indices of nearest geometries, optionally with distances.
- Return type:
Module contents
Added in version 1.8.0: geopandas API on Sedona