Class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>
- java.lang.Object
-
- org.apache.sedona.core.joinJudgement.KnnJoinIndexJudgement<T,U>
-
- Type Parameters:
T- extends Geometry - the type of geometries in the left setU- extends Geometry - the type of geometries in the right set
- All Implemented Interfaces:
Serializable,org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>
public class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry> extends Object implements org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>, Serializable
This class is responsible for performing a K-nearest neighbors (KNN) join operation using a spatial index. It extends the JudgementBase class and implements the FlatMapFunction2 interface.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)Constructor for the KnnJoinIndexJudgement class.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>call(Iterator<T> queryShapes, Iterator<U> objectShapes)This method performs the KNN join operation.Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>callUsingBroadcastObjectIndex(Iterator<T> queryShapes)This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>callUsingBroadcastQueryList(Iterator<U> objectShapes)This method performs the KNN join operation using the broadcast query geometries.static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry>
doubledistance(U key, T value, DistanceMetric distanceMetric)static doubledistanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)This method calculates the distance between two geometries using the specified distance metric.static org.locationtech.jts.index.strtree.ItemDistancegetItemDistance(DistanceMetric distanceMetric)static org.locationtech.jts.index.strtree.ItemDistancegetItemDistanceByMetric(DistanceMetric distanceMetric)This method returns the ItemDistance object based on the specified distance metric.protected booleanhasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)Iterator model for the nest loop join.protected booleanhasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)Iterator model for the index-based join.protected voidinitPartition()Looks up the extent of the current partition.protected voidlog(String message, Object... params)protected org.apache.commons.lang3.tuple.Pair<U,T>nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)Iterator model for the nest loop join.protected org.apache.commons.lang3.tuple.Pair<U,T>nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)Iterator model for the index-based join.
-
-
-
Field Detail
-
buildCount
protected final org.apache.spark.util.LongAccumulator buildCount
-
streamCount
protected final org.apache.spark.util.LongAccumulator streamCount
-
resultCount
protected final org.apache.spark.util.LongAccumulator resultCount
-
candidateCount
protected final org.apache.spark.util.LongAccumulator candidateCount
-
-
Constructor Detail
-
KnnJoinIndexJudgement
public KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)Constructor for the KnnJoinIndexJudgement class.- Parameters:
k- the number of nearest neighbors to finddistanceMetric- the distance metric to usebroadcastQueryObjects- the broadcast geometries on queriesbroadcastObjectsTreeIndex- the broadcast spatial index on objectsbuildCount- accumulator for the number of geometries processed from the build sidestreamCount- accumulator for the number of geometries processed from the stream sideresultCount- accumulator for the number of join resultscandidateCount- accumulator for the number of candidate matches
-
-
Method Detail
-
call
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> call(Iterator<T> queryShapes, Iterator<U> objectShapes) throws Exception
This method performs the KNN join operation. It iterates over the geometries in the stream side and uses the spatial index to find the k nearest neighbors for each geometry. The method returns an iterator over the join results.- Specified by:
callin interfaceorg.apache.spark.api.java.function.FlatMapFunction2<Iterator<T extends org.locationtech.jts.geom.Geometry>,Iterator<U extends org.locationtech.jts.geom.Geometry>,org.apache.commons.lang3.tuple.Pair<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>>- Parameters:
queryShapes- iterator over the geometries in the query sideobjectShapes- iterator over the geometries in the object side- Returns:
- an iterator over the join results
- Throws:
Exception- if the spatial index is not of type STRtree
-
callUsingBroadcastObjectIndex
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastObjectIndex(Iterator<T> queryShapes)
This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.- Parameters:
queryShapes- iterator over the geometries in the query side- Returns:
- an iterator over the join results
-
callUsingBroadcastQueryList
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastQueryList(Iterator<U> objectShapes)
This method performs the KNN join operation using the broadcast query geometries.- Parameters:
objectShapes- iterator over the geometries in the object side- Returns:
- an iterator over the join results
-
distanceByMetric
public static double distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)This method calculates the distance between two geometries using the specified distance metric.- Parameters:
queryGeom- the query geometrycandidateGeom- the candidate geometrydistanceMetric- the distance metric to use- Returns:
- the distance between the two geometries
-
getItemDistanceByMetric
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric(DistanceMetric distanceMetric)
This method returns the ItemDistance object based on the specified distance metric.- Parameters:
distanceMetric- the distance metric to use- Returns:
- the ItemDistance object
-
distance
public static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry> double distance(U key, T value, DistanceMetric distanceMetric)
-
getItemDistance
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistance(DistanceMetric distanceMetric)
-
initPartition
protected void initPartition()
Looks up the extent of the current partition. If found, `match` method will activate the logic to avoid emitting duplicate join results from multiple partitions.Must be called before processing a partition. Must be called from the same instance that will be used to process the partition.
-
hasNextBase
protected boolean hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)Iterator model for the index-based join. It checks if there is a next match and populate it to the result.- Parameters:
spatialIndex-streamShapes-buildLeft-- Returns:
-
hasNextBase
protected boolean hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join. It checks if there is a next match and populate it to the result.- Parameters:
buildShapes-streamShapes-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)Iterator model for the index-based join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
spatialIndex-streamShapes-buildLeft-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
buildShapes-streamShapes-- Returns:
-
-