package spatialPartitioning
Type Members
-
class
BroadcastedSpatialPartitioner extends SpatialPartitioner
The SpatialPartitioner may contain a large number of grids, which may make the serialized tasks to be larger than 1MB and trigger a warning: "WARN DAGScheduler: Broadcasting large task binary with size XXXX KB".
The SpatialPartitioner may contain a large number of grids, which may make the serialized tasks to be larger than 1MB and trigger a warning: "WARN DAGScheduler: Broadcasting large task binary with size XXXX KB". This class is a wrapper around a SpatialPartitioner that is broadcasted to reduce the size of serialized tasks.
-
class
EqualPartitioning extends Serializable
The Class EqualPartitioning.
- class FlatGridPartitioner extends SpatialPartitioner
- class HilbertPartitioning extends Serializable
-
class
KDB extends PartitioningUtils with Serializable
see https://en.wikipedia.org/wiki/K-D-B-tree
- class KDBTreePartitioner extends SpatialPartitioner
- abstract class PartitioningUtils extends AnyRef
- class QuadTreePartitioner extends SpatialPartitioner
-
class
QuadTreeRTPartitioner extends QuadTreePartitioner
This class implements spatial partitioner based on the principles outlined in:
This class implements spatial partitioner based on the principles outlined in:
It uses the Quad-tree partitioning strategy to create balanced partitions This partitioning is essential for efficiently executing kNN joins. The process involves the following steps:
1. **Quad-tree Partitioning**: - Data is partitioned into partitions using the Quad-tree data structure. - This ensures balanced partitions and spatial locality preservation.
2. **Global R-Tree Index Construction**: - A set of random samples S' from the dataset S is taken. - An R-tree T is built over S' in the master node (driver program).
3. **Distance Bound Calculation**: - For each partition Ri, the distance ui from the furthest point in Ri to the centroid cri is calculated. - k-nearest neighbors of each centroid cri are found using the R-tree T. - A distance bound γi is derived for each Ri, defined as γi = 2ui +
|cri, sk|, where sk is the k-th nearest neighbor of cri.
4. **Partitioning Neighbors**: - For each partition Ri, a subset Si ⊂ S is identified such that for any r ∈ Ri, knn(r, S) = knn(r, Si) using a circle range query centered at cri with radius γi. - This guarantees that the k-nearest neighbors of any point in Ri can be found within the subset Si.
5. **Parallel Local kNN Joins**: - Each combined partition (Ri, Si) is processed in parallel. - An R-tree is built over Si, and a local kNN join is performed for each record in Ri. - The results from all partitions are combined to produce the final kNN join results.
Reference: Xie, Dong, et al. "Simba: Efficient in-memory spatial analytics." In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data (SIGMOD '16), 2016, DOI: 10.1145/2882903.2915237.
-
class
QuadTreeRTPartitioning extends QuadtreePartitioning
The class is used to build an R-tree over a random sample of another dataset and uses distance bounds to ensure efficient local kNN joins.
The class is used to build an R-tree over a random sample of another dataset and uses distance bounds to ensure efficient local kNN joins.
By calculating distance bounds and using circle range queries, it ensures that the subsets Si, containing all necessary points for accurate kNN results. The final union of local join results provides the complete kNN join result for the datasets R and S.
It generates List<List<Integer>> expandedPartitionedBoundaries based on the quad tree.
- class QuadtreePartitioning extends Serializable
-
class
RtreePartitioning extends Serializable
The Class RtreePartitioning.
- abstract class SpatialPartitioner extends Partitioner with Serializable
-
class
VoronoiPartitioning extends Serializable
The Class VoronoiPartitioning.