sedona.spark.stac package

Submodules

sedona.spark.stac.client module

class sedona.spark.stac.client.Client(url: str)[source]

Bases: object

__init__(url: str)[source]
get_collection(collection_id: str)[source]

Retrieves a collection client for the specified collection ID.

This method creates an instance of the CollectionClient class for the given collection ID, allowing interaction with the specified collection in the STAC API.

Parameters:

collection_id – The ID of the collection to retrieve. Example: “aster-l1t”

Returns:

An instance of the CollectionClient class for the specified collection.

get_collection_from_catalog()[source]

Retrieves the catalog from the STAC API.

This method fetches the root catalog from the STAC API, providing access to all collections and items.

Returns:

The root catalog of the STAC API.

Return type:

dict

classmethod open(url: str)[source]

Opens a connection to the specified STAC API URL.

This class method creates an instance of the Client class with the given URL.

Parameters:

url – The URL of the STAC API to connect to. Example: “https://planetarycomputer.microsoft.com/api/stac/v1

Returns:

An instance of the Client class connected to the specified URL.

search(*ids: str | list, collection_id: str | None = None, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None, return_dataframe: bool = True) Iterator[pystac.Item] | DataFrame[source]

Searches for items in the specified collection with optional filters.

Parameters:
  • ids – A variable number of item IDs to filter the items. Example: “item_id1” or [“item_id1”, “item_id2”]

  • collection_id – The ID of the collection to search in. Example: “aster-l1t”

  • bbox – A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: [min_lon, min_lat, max_lon, max_lat]. Example: [[-180.0, -90.0, 180.0, 90.0]] # This bounding box covers the entire world.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence. Example: Polygon(…) or “POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))” or [Polygon(…), Polygon(…)]

  • datetime

    A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items. The datetime can be specified in various formats:

    • ”YYYY” expands to [“YYYY-01-01T00:00:00Z”, “YYYY-12-31T23:59:59Z”]

    • ”YYYY-mm” expands to [“YYYY-mm-01T00:00:00Z”, “YYYY-mm-<last_day>T23:59:59Z”]

    • ”YYYY-mm-dd” expands to [“YYYY-mm-ddT00:00:00Z”, “YYYY-mm-ddT23:59:59Z”]

    • ”YYYY-mm-ddTHH:MM:SSZ” remains as [“YYYY-mm-ddTHH:MM:SSZ”, “YYYY-mm-ddTHH:MM:SSZ”]

    • A list of date-time ranges can be provided for multiple intervals.

    Example: “2020-01-01T00:00:00Z” or python_datetime.datetime(2020, 1, 1) or [[“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”]]

  • max_items – The maximum number of items to return from the search, even if there are more matching results. Example: 100

  • return_dataframe – If True, return the result as a Spark DataFrame instead of an iterator of PyStacItem objects. Example: True

Returns:

An iterator of PyStacItem objects or a Spark DataFrame that match the specified filters.

sedona.spark.stac.collection_client module

class sedona.spark.stac.collection_client.CollectionClient(url: str, collection_id: str | None = None)[source]

Bases: object

__init__(url: str, collection_id: str | None = None)[source]
get_dataframe(*ids: str | list, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None) DataFrame[source]

Returns a Spark DataFrame of items with optional spatial and temporal extents.

This method loads the collection data from the specified collection URL and applies optional spatial and temporal filters to the data. The spatial filter is applied using a bounding box, and the temporal filter is applied using a date-time range.

Parameters:
  • ids – A variable number of item IDs to filter the items. Example: “item_id1” or [“item_id1”, “item_id2”]

  • bbox – A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: [min_lon, min_lat, max_lon, max_lat]. Example: [[-180.0, -90.0, 180.0, 90.0]] # This bounding box covers the entire world.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence. Example: Polygon(…) or “POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))” or [Polygon(…), Polygon(…)]

  • datetime – A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items. Example: “2020-01-01T00:00:00Z” or python_datetime.datetime(2020, 1, 1) or [[“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”]]

  • max_items – The maximum number of items to return from the search.

Returns:

A Spark DataFrame containing the filtered items. If no filters are provided, the DataFrame contains all items in the collection.

Raises:

RuntimeError – If there is an error loading the data or applying the filters, a RuntimeError is raised with a message indicating the failure.

get_items(*ids: str | list, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None) Iterator[pystac.Item][source]

Returns an iterator of items. Each item has the supplied item ID and/or optional spatial and temporal extents.

This method loads the collection data from the specified collection URL and applies optional filters to the data.

Parameters:
  • ids – A list of item IDs to filter the items. If not provided, no ID filtering is applied.

  • bbox – A list of bounding boxes for filtering the items.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence.

  • datetime – A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items.

  • max_items – The maximum number of items to return from the search, even if there are more matching results.

Returns:

An iterator of PyStacItem objects that match the specified filters. If no filters are provided, the iterator contains all items in the collection.

Raises:

RuntimeError – If there is an error loading the data or applying the filters, a RuntimeError is raised with a message indicating the failure.

load_items_df(bbox, geometry, datetime, ids, max_items)[source]
save_to_geoparquet(*ids: str | list, output_path: str, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: list | None = None) None[source]

Loads the STAC DataFrame and saves it to Parquet format at the given output path.

This method loads the collection data from the specified collection URL and applies optional spatial and temporal filters to the data. The filtered data is then saved to the specified output path in Parquet format.

Parameters:
  • ids – A list of item IDs to filter the items. If not provided, no ID filtering is applied.

  • output_path – The path where the Parquet file will be saved.

  • bbox – A bounding box for filtering the items. If not provided, no spatial filtering is applied.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. If both bbox and geometry are provided, geometry takes precedence.

  • datetime – A temporal extent that defines the date-time range for filtering the items. If not provided, no temporal filtering is applied. To match a single datetime, you can set the start and end datetime to the same value in the datetime. Example: [[“2020-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”]]

Raises:

RuntimeError – If there is an error loading the data, applying the filters, or saving the DataFrame to Parquet format, a RuntimeError is raised with a message indicating the failure.

sedona.spark.stac.collection_client.get_collection_url(url: str, collection_id: str | None = None) str[source]

Constructs the collection URL based on the provided base URL and optional collection ID.

If the collection ID is provided and the URL starts with ‘http’ or ‘https’, the collection ID is appended to the URL. Otherwise, an exception is raised.

Parameters:
  • url (str) – The base URL of the STAC collection.

  • collection_id (Optional[str]) – The optional collection ID to append to the URL.

Returns:

The constructed collection URL.

Return type:

str

Raises:

ValueError – If the URL does not start with ‘http’ or ‘https’ and a collection ID is provided.

Module contents