public class RDDSampleUtils extends Object
Constructor and Description |
---|
RDDSampleUtils() |
Modifier and Type | Method and Description |
---|---|
static int |
getSampleNumbers(int numPartitions,
long totalNumberOfRecords,
int givenSampleNumbers)
Returns the number of samples to take to partition the RDD into specified number of partitions.
|
public static int getSampleNumbers(int numPartitions, long totalNumberOfRecords, int givenSampleNumbers)
Number of partitions cannot exceed half the number of records in the RDD.
Returns total number of records if it is < 1000. Otherwise, returns 1% of the total number of records or twice the number of partitions whichever is larger. Never returns a number > Integer.MAX_VALUE.
If desired number of samples is not -1, returns that number.
numPartitions
- the num partitionstotalNumberOfRecords
- the total number of recordsgivenSampleNumbers
- the given sample numbersIllegalArgumentException
- if requested number of samples exceeds total number of records
or if requested number of partitions exceeds half of total number of recordsCopyright © 2023 The Apache Software Foundation. All rights reserved.