public class RDDSampleUtils extends Object
| Constructor and Description |
|---|
RDDSampleUtils() |
| Modifier and Type | Method and Description |
|---|---|
static int |
getSampleNumbers(int numPartitions,
long totalNumberOfRecords,
int givenSampleNumbers)
Returns the number of samples to take to partition the RDD into specified number of partitions.
|
public static int getSampleNumbers(int numPartitions,
long totalNumberOfRecords,
int givenSampleNumbers)
Number of partitions cannot exceed half the number of records in the RDD.
Returns total number of records if it is < 1000. Otherwise, returns 1% of the total number of records or twice the number of partitions whichever is larger. Never returns a number > Integer.MAX_VALUE.
If desired number of samples is not -1, returns that number.
numPartitions - the num partitionstotalNumberOfRecords - the total number of recordsgivenSampleNumbers - the given sample numbersIllegalArgumentException - if requested number of samples exceeds total number of records
or if requested number of partitions exceeds half of total number of recordsCopyright © 2023 The Apache Software Foundation. All rights reserved.