Class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,​U extends org.locationtech.jts.geom.Geometry>

  • Type Parameters:
    T - extends Geometry - the type of geometries in the left set
    U - extends Geometry - the type of geometries in the right set
    All Implemented Interfaces:
    Serializable, org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,​Iterator<U>,​org.apache.commons.lang3.tuple.Pair<T,​U>>

    public class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,​U extends org.locationtech.jts.geom.Geometry>
    extends Object
    implements org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,​Iterator<U>,​org.apache.commons.lang3.tuple.Pair<T,​U>>, Serializable
    This class is responsible for performing a K-nearest neighbors (KNN) join operation using a spatial index. It extends the JudgementBase class and implements the FlatMapFunction2 interface.
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected org.apache.spark.util.LongAccumulator buildCount  
      protected org.apache.spark.util.LongAccumulator candidateCount  
      protected org.apache.spark.util.LongAccumulator resultCount  
      protected org.apache.spark.util.LongAccumulator streamCount  
    • Constructor Summary

      Constructors 
      Constructor Description
      KnnJoinIndexJudgement​(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount)
      Constructor for the KnnJoinIndexJudgement class.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> call​(Iterator<T> queryShapes, Iterator<U> objectShapes)
      This method performs the KNN join operation.
      Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> callUsingBroadcastObjectIndex​(Iterator<T> queryShapes)
      This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.
      Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> callUsingBroadcastQueryList​(Iterator<U> objectShapes)
      This method performs the KNN join operation using the broadcast query geometries.
      static <U extends org.locationtech.jts.geom.Geometry,​T extends org.locationtech.jts.geom.Geometry>
      double
      distance​(U key, T value, DistanceMetric distanceMetric)  
      static double distanceByMetric​(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric)
      This method calculates the distance between two geometries using the specified distance metric.
      static org.locationtech.jts.index.strtree.ItemDistance getItemDistance​(DistanceMetric distanceMetric)  
      static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric​(DistanceMetric distanceMetric)
      This method returns the ItemDistance object based on the specified distance metric.
      protected boolean hasNextBase​(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
      Iterator model for the nest loop join.
      protected boolean hasNextBase​(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
      Iterator model for the index-based join.
      protected void initPartition()
      Looks up the extent of the current partition.
      protected void log​(String message, Object... params)  
      protected org.apache.commons.lang3.tuple.Pair<U,​T> nextBase​(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
      Iterator model for the nest loop join.
      protected org.apache.commons.lang3.tuple.Pair<U,​T> nextBase​(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft)
      Iterator model for the index-based join.
    • Field Detail

      • buildCount

        protected final org.apache.spark.util.LongAccumulator buildCount
      • streamCount

        protected final org.apache.spark.util.LongAccumulator streamCount
      • resultCount

        protected final org.apache.spark.util.LongAccumulator resultCount
      • candidateCount

        protected final org.apache.spark.util.LongAccumulator candidateCount
    • Constructor Detail

      • KnnJoinIndexJudgement

        public KnnJoinIndexJudgement​(int k,
                                     DistanceMetric distanceMetric,
                                     boolean includeTies,
                                     org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects,
                                     org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex,
                                     org.apache.spark.util.LongAccumulator buildCount,
                                     org.apache.spark.util.LongAccumulator streamCount,
                                     org.apache.spark.util.LongAccumulator resultCount,
                                     org.apache.spark.util.LongAccumulator candidateCount)
        Constructor for the KnnJoinIndexJudgement class.
        Parameters:
        k - the number of nearest neighbors to find
        distanceMetric - the distance metric to use
        broadcastQueryObjects - the broadcast geometries on queries
        broadcastObjectsTreeIndex - the broadcast spatial index on objects
        buildCount - accumulator for the number of geometries processed from the build side
        streamCount - accumulator for the number of geometries processed from the stream side
        resultCount - accumulator for the number of join results
        candidateCount - accumulator for the number of candidate matches
    • Method Detail

      • call

        public Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> call​(Iterator<T> queryShapes,
                                                                             Iterator<U> objectShapes)
                                                                      throws Exception
        This method performs the KNN join operation. It iterates over the geometries in the stream side and uses the spatial index to find the k nearest neighbors for each geometry. The method returns an iterator over the join results.
        Specified by:
        call in interface org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T extends org.locationtech.jts.geom.Geometry>,​Iterator<U extends org.locationtech.jts.geom.Geometry>,​org.apache.commons.lang3.tuple.Pair<T extends org.locationtech.jts.geom.Geometry,​U extends org.locationtech.jts.geom.Geometry>>
        Parameters:
        queryShapes - iterator over the geometries in the query side
        objectShapes - iterator over the geometries in the object side
        Returns:
        an iterator over the join results
        Throws:
        Exception - if the spatial index is not of type STRtree
      • callUsingBroadcastObjectIndex

        public Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> callUsingBroadcastObjectIndex​(Iterator<T> queryShapes)
        This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.
        Parameters:
        queryShapes - iterator over the geometries in the query side
        Returns:
        an iterator over the join results
      • callUsingBroadcastQueryList

        public Iterator<org.apache.commons.lang3.tuple.Pair<T,​U>> callUsingBroadcastQueryList​(Iterator<U> objectShapes)
        This method performs the KNN join operation using the broadcast query geometries.
        Parameters:
        objectShapes - iterator over the geometries in the object side
        Returns:
        an iterator over the join results
      • distanceByMetric

        public static double distanceByMetric​(org.locationtech.jts.geom.Geometry queryGeom,
                                              org.locationtech.jts.geom.Geometry candidateGeom,
                                              DistanceMetric distanceMetric)
        This method calculates the distance between two geometries using the specified distance metric.
        Parameters:
        queryGeom - the query geometry
        candidateGeom - the candidate geometry
        distanceMetric - the distance metric to use
        Returns:
        the distance between the two geometries
      • getItemDistanceByMetric

        public static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric​(DistanceMetric distanceMetric)
        This method returns the ItemDistance object based on the specified distance metric.
        Parameters:
        distanceMetric - the distance metric to use
        Returns:
        the ItemDistance object
      • distance

        public static <U extends org.locationtech.jts.geom.Geometry,​T extends org.locationtech.jts.geom.Geometry> double distance​(U key,
                                                                                                                                        T value,
                                                                                                                                        DistanceMetric distanceMetric)
      • getItemDistance

        public static org.locationtech.jts.index.strtree.ItemDistance getItemDistance​(DistanceMetric distanceMetric)
      • initPartition

        protected void initPartition()
        Looks up the extent of the current partition. If found, `match` method will activate the logic to avoid emitting duplicate join results from multiple partitions.

        Must be called before processing a partition. Must be called from the same instance that will be used to process the partition.

      • hasNextBase

        protected boolean hasNextBase​(org.locationtech.jts.index.SpatialIndex spatialIndex,
                                      Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes,
                                      boolean buildLeft)
        Iterator model for the index-based join. It checks if there is a next match and populate it to the result.
        Parameters:
        spatialIndex -
        streamShapes -
        buildLeft -
        Returns:
      • hasNextBase

        protected boolean hasNextBase​(List<? extends org.locationtech.jts.geom.Geometry> buildShapes,
                                      Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
        Iterator model for the nest loop join. It checks if there is a next match and populate it to the result.
        Parameters:
        buildShapes -
        streamShapes -
        Returns:
      • nextBase

        protected org.apache.commons.lang3.tuple.Pair<U,​T> nextBase​(org.locationtech.jts.index.SpatialIndex spatialIndex,
                                                                          Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes,
                                                                          boolean buildLeft)
        Iterator model for the index-based join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.
        Parameters:
        spatialIndex -
        streamShapes -
        buildLeft -
        Returns:
      • nextBase

        protected org.apache.commons.lang3.tuple.Pair<U,​T> nextBase​(List<? extends org.locationtech.jts.geom.Geometry> buildShapes,
                                                                          Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes)
        Iterator model for the nest loop join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.
        Parameters:
        buildShapes -
        streamShapes -
        Returns:
      • log

        protected void log​(String message,
                           Object... params)