sedona.spark.stac package

Submodules

sedona.spark.stac.client module

class sedona.spark.stac.client.Client(url: str, headers: dict | None = None)[source]

Bases: object

__init__(url: str, headers: dict | None = None)[source]

Initializes a STAC client with optional authentication headers.

Parameters:
  • url – The URL of the STAC API to connect to.

  • headers – Optional dictionary of HTTP headers to include in requests. Can be used for authentication or custom headers.

get_collection(collection_id: str)[source]

Retrieves a collection client for the specified collection ID.

This method creates an instance of the CollectionClient class for the given collection ID, allowing interaction with the specified collection in the STAC API.

Parameters:

collection_id – The ID of the collection to retrieve. Example: “aster-l1t”

Returns:

An instance of the CollectionClient class for the specified collection.

get_collection_from_catalog()[source]

Retrieves the catalog from the STAC API.

This method fetches the root catalog from the STAC API, providing access to all collections and items.

Returns:

The root catalog of the STAC API.

Return type:

dict

classmethod open(url: str, headers: dict | None = None)[source]

Opens a connection to the specified STAC API URL.

This class method creates an instance of the Client class with the given URL and optional authentication headers.

Parameters:
Returns:

An instance of the Client class connected to the specified URL.

Example usage:

# Without authentication client = Client.open(”https://planetarycomputer.microsoft.com/api/stac/v1”)

# With custom headers client = Client.open(

https://example.com/stac/v1”, headers={“Authorization”: “Bearer token123”}

)

# Using convenience methods client = Client.open(”https://example.com/stac/v1”) client.with_basic_auth(“username”, “password”)

search(*ids: str | list, collection_id: str | None = None, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None, return_dataframe: bool = True) Iterator | DataFrame[source]

Searches for items in the specified collection with optional filters.

Parameters:
  • ids – A variable number of item IDs to filter the items. Example: “item_id1” or [“item_id1”, “item_id2”]

  • collection_id – The ID of the collection to search in. Example: “aster-l1t”

  • bbox – A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: [min_lon, min_lat, max_lon, max_lat]. Example: [[-180.0, -90.0, 180.0, 90.0]] # This bounding box covers the entire world.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence. Example: Polygon(…) or “POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))” or [Polygon(…), Polygon(…)]

  • datetime

    A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items. The datetime can be specified in various formats:

    • ”YYYY” expands to [“YYYY-01-01T00:00:00Z”, “YYYY-12-31T23:59:59Z”]

    • ”YYYY-mm” expands to [“YYYY-mm-01T00:00:00Z”, “YYYY-mm-<last_day>T23:59:59Z”]

    • ”YYYY-mm-dd” expands to [“YYYY-mm-ddT00:00:00Z”, “YYYY-mm-ddT23:59:59Z”]

    • ”YYYY-mm-ddTHH:MM:SSZ” remains as [“YYYY-mm-ddTHH:MM:SSZ”, “YYYY-mm-ddTHH:MM:SSZ”]

    • A list of date-time ranges can be provided for multiple intervals.

    Example: “2020-01-01T00:00:00Z” or python_datetime.datetime(2020, 1, 1) or [[“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”]]

  • max_items – The maximum number of items to return from the search, even if there are more matching results. Example: 100

  • return_dataframe – If True, return the result as a Spark DataFrame instead of an iterator of PyStacItem objects. Example: True

Returns:

An iterator of PyStacItem objects or a Spark DataFrame that match the specified filters.

with_basic_auth(username: str, password: str)[source]

Adds HTTP Basic Authentication to the client.

This method encodes the username and password using Base64 and adds the appropriate Authorization header for HTTP Basic Authentication.

Parameters:
  • username – The username for authentication. For API keys, this is typically the API key itself.

  • password – The password for authentication. For API keys, this is often left empty.

Returns:

Self for method chaining.

Example usage:

# Standard basic auth client = Client.open(”https://example.com/stac/v1”) client.with_basic_auth(“user”, “pass”)

# API key as username (common pattern) client.with_basic_auth(“api_key_xyz”, “”)

# Method chaining df = Client.open(url).with_basic_auth(api_key, “”).search(collection_id=”test”)

with_bearer_token(token: str)[source]

Adds Bearer Token Authentication to the client.

This method adds the appropriate Authorization header for Bearer Token authentication, commonly used with OAuth2 and API tokens.

Parameters:

token – The bearer token for authentication.

Returns:

Self for method chaining.

Example usage:

# Bearer token auth client = Client.open(”https://example.com/stac/v1”) client.with_bearer_token(“your_access_token_here”)

# Method chaining df = Client.open(url).with_bearer_token(token).search(collection_id=”test”)

sedona.spark.stac.collection_client module

class sedona.spark.stac.collection_client.CollectionClient(url: str, collection_id: str | None = None, headers: dict | None = None)[source]

Bases: object

__init__(url: str, collection_id: str | None = None, headers: dict | None = None)[source]

Initializes a collection client for a STAC collection.

Parameters:
  • url – The base URL of the STAC API.

  • collection_id – The ID of the collection to access. If None, accesses the catalog root.

  • headers – Optional dictionary of HTTP headers for authentication.

get_dataframe(*ids: str | list, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None) DataFrame[source]

Returns a Spark DataFrame of items with optional spatial and temporal extents.

This method loads the collection data from the specified collection URL and applies optional spatial and temporal filters to the data. The spatial filter is applied using a bounding box, and the temporal filter is applied using a date-time range.

Parameters:
  • ids – A variable number of item IDs to filter the items. Example: “item_id1” or [“item_id1”, “item_id2”]

  • bbox – A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: [min_lon, min_lat, max_lon, max_lat]. Example: [[-180.0, -90.0, 180.0, 90.0]] # This bounding box covers the entire world.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence. Example: Polygon(…) or “POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))” or [Polygon(…), Polygon(…)]

  • datetime – A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items. Example: “2020-01-01T00:00:00Z” or python_datetime.datetime(2020, 1, 1) or [[“2020-01-01T00:00:00Z”, “2021-01-01T00:00:00Z”]]

  • max_items – The maximum number of items to return from the search.

Returns:

A Spark DataFrame containing the filtered items. If no filters are provided, the DataFrame contains all items in the collection.

Raises:

RuntimeError – If there is an error loading the data or applying the filters, a RuntimeError is raised with a message indicating the failure.

get_items(*ids: str | list, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: str | datetime | list | None = None, max_items: int | None = None) Iterator[source]

Returns an iterator of items. Each item has the supplied item ID and/or optional spatial and temporal extents.

This method loads the collection data from the specified collection URL and applies optional filters to the data.

Parameters:
  • ids – A list of item IDs to filter the items. If not provided, no ID filtering is applied.

  • bbox – A list of bounding boxes for filtering the items.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. Can be a single geometry, WKT string, or a list of geometries/WKT strings. If both bbox and geometry are provided, geometry takes precedence.

  • datetime – A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items.

  • max_items – The maximum number of items to return from the search, even if there are more matching results.

Returns:

An iterator of PyStacItem objects that match the specified filters. If no filters are provided, the iterator contains all items in the collection.

Raises:

RuntimeError – If there is an error loading the data or applying the filters, a RuntimeError is raised with a message indicating the failure.

load_items_df(bbox, geometry, datetime, ids, max_items)[source]

Loads items from the STAC collection as a Spark DataFrame.

This method handles the conversion of headers to Spark options and applies various filters to the data.

save_to_geoparquet(*ids: str | list, output_path: str, bbox: list | None = None, geometry: str | BaseGeometry | List[str | BaseGeometry] | None = None, datetime: list | None = None) None[source]

Loads the STAC DataFrame and saves it to Parquet format at the given output path.

This method loads the collection data from the specified collection URL and applies optional spatial and temporal filters to the data. The filtered data is then saved to the specified output path in Parquet format.

Parameters:
  • ids – A list of item IDs to filter the items. If not provided, no ID filtering is applied.

  • output_path – The path where the Parquet file will be saved.

  • bbox – A bounding box for filtering the items. If not provided, no spatial filtering is applied.

  • geometry – Shapely geometry object(s) or WKT string(s) for spatial filtering. If both bbox and geometry are provided, geometry takes precedence.

  • datetime – A temporal extent that defines the date-time range for filtering the items. If not provided, no temporal filtering is applied. To match a single datetime, you can set the start and end datetime to the same value in the datetime. Example: [[“2020-01-01T00:00:00Z”, “2020-01-01T00:00:00Z”]]

Raises:

RuntimeError – If there is an error loading the data, applying the filters, or saving the DataFrame to Parquet format, a RuntimeError is raised with a message indicating the failure.

sedona.spark.stac.collection_client.get_collection_url(url: str, collection_id: str | None = None) str[source]

Constructs the collection URL based on the provided base URL and optional collection ID.

If the collection ID is provided and the URL starts with ‘http’ or ‘https’, the collection ID is appended to the URL. Otherwise, an exception is raised.

Parameters:
  • url (str) – The base URL of the STAC collection.

  • collection_id (Optional[str]) – The optional collection ID to append to the URL.

Returns:

The constructed collection URL.

Return type:

str

Raises:

ValueError – If the URL does not start with ‘http’ or ‘https’ and a collection ID is provided.

Module contents