Class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>
- java.lang.Object
-
- org.apache.sedona.core.joinJudgement.KnnJoinIndexJudgement<T,U>
-
- Type Parameters:
T
- extends Geometry - the type of geometries in the left setU
- extends Geometry - the type of geometries in the right set
- All Implemented Interfaces:
Serializable
,org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>
public class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry> extends Object implements org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>, Serializable
This class is responsible for performing a K-nearest neighbors (KNN) join operation using a spatial index. It extends the JudgementBase class and implements the FlatMapFunction2 interface.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)
Constructor for the KnnJoinIndexJudgement class.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
call(Iterator<T> queryShapes, Iterator<U> objectShapes)
This method performs the KNN join operation.Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
callUsingBroadcastObjectIndex(Iterator<T> queryShapes)
This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.Iterator<org.apache.commons.lang3.tuple.Pair<T,U>>
callUsingBroadcastQueryList(Iterator<U> objectShapes)
This method performs the KNN join operation using the broadcast query geometries.static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry>
doubledistance(U key, T value, DistanceMetric distanceMetric)
static double
distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)
This method calculates the distance between two geometries using the specified distance metric.static org.locationtech.jts.index.strtree.ItemDistance
getItemDistance(DistanceMetric distanceMetric)
static org.locationtech.jts.index.strtree.ItemDistance
getItemDistanceByMetric(DistanceMetric distanceMetric)
This method returns the ItemDistance object based on the specified distance metric.protected boolean
hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join.protected boolean
hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
Iterator model for the index-based join.protected void
initPartition()
Looks up the extent of the current partition.protected void
log(String message, Object... params)
protected org.apache.commons.lang3.tuple.Pair<U,T>
nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join.protected org.apache.commons.lang3.tuple.Pair<U,T>
nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
Iterator model for the index-based join.
-
-
-
Field Detail
-
buildCount
protected final org.apache.spark.util.LongAccumulator buildCount
-
streamCount
protected final org.apache.spark.util.LongAccumulator streamCount
-
resultCount
protected final org.apache.spark.util.LongAccumulator resultCount
-
candidateCount
protected final org.apache.spark.util.LongAccumulator candidateCount
-
-
Constructor Detail
-
KnnJoinIndexJudgement
public KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)
Constructor for the KnnJoinIndexJudgement class.- Parameters:
k
- the number of nearest neighbors to finddistanceMetric
- the distance metric to usebroadcastQueryObjects
- the broadcast geometries on queriesbroadcastObjectsTreeIndex
- the broadcast spatial index on objectsbuildCount
- accumulator for the number of geometries processed from the build sidestreamCount
- accumulator for the number of geometries processed from the stream sideresultCount
- accumulator for the number of join resultscandidateCount
- accumulator for the number of candidate matches
-
-
Method Detail
-
call
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> call(Iterator<T> queryShapes, Iterator<U> objectShapes) throws Exception
This method performs the KNN join operation. It iterates over the geometries in the stream side and uses the spatial index to find the k nearest neighbors for each geometry. The method returns an iterator over the join results.- Specified by:
call
in interfaceorg.apache.spark.api.java.function.FlatMapFunction2<Iterator<T extends org.locationtech.jts.geom.Geometry>,Iterator<U extends org.locationtech.jts.geom.Geometry>,org.apache.commons.lang3.tuple.Pair<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>>
- Parameters:
queryShapes
- iterator over the geometries in the query sideobjectShapes
- iterator over the geometries in the object side- Returns:
- an iterator over the join results
- Throws:
Exception
- if the spatial index is not of type STRtree
-
callUsingBroadcastObjectIndex
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastObjectIndex(Iterator<T> queryShapes)
This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.- Parameters:
queryShapes
- iterator over the geometries in the query side- Returns:
- an iterator over the join results
-
callUsingBroadcastQueryList
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastQueryList(Iterator<U> objectShapes)
This method performs the KNN join operation using the broadcast query geometries.- Parameters:
objectShapes
- iterator over the geometries in the object side- Returns:
- an iterator over the join results
-
distanceByMetric
public static double distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)
This method calculates the distance between two geometries using the specified distance metric.- Parameters:
queryGeom
- the query geometrycandidateGeom
- the candidate geometrydistanceMetric
- the distance metric to use- Returns:
- the distance between the two geometries
-
getItemDistanceByMetric
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric(DistanceMetric distanceMetric)
This method returns the ItemDistance object based on the specified distance metric.- Parameters:
distanceMetric
- the distance metric to use- Returns:
- the ItemDistance object
-
distance
public static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry> double distance(U key, T value, DistanceMetric distanceMetric)
-
getItemDistance
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistance(DistanceMetric distanceMetric)
-
initPartition
protected void initPartition()
Looks up the extent of the current partition. If found, `match` method will activate the logic to avoid emitting duplicate join results from multiple partitions.Must be called before processing a partition. Must be called from the same instance that will be used to process the partition.
-
hasNextBase
protected boolean hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
Iterator model for the index-based join. It checks if there is a next match and populate it to the result.- Parameters:
spatialIndex
-streamShapes
-buildLeft
-- Returns:
-
hasNextBase
protected boolean hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join. It checks if there is a next match and populate it to the result.- Parameters:
buildShapes
-streamShapes
-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
Iterator model for the index-based join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
spatialIndex
-streamShapes
-buildLeft
-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
Iterator model for the nest loop join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
buildShapes
-streamShapes
-- Returns:
-
-