-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while using bucket partitions #274
Comments
Yeah, you need to get the bucket function to Spark and sort by that. Here's how to create a Spark UDF with the function: import com.netflix.iceberg.transforms.Transforms
import com.netflix.iceberg.types.Types
import org.apache.spark.sql.types.IntegerType
// load the bucket transform from Iceberg to use as a UDF
val bucketTransform = Transforms.bucket[java.lang.Long](Types.LongType.get(), 16)
// needed because Scala has trouble with the Java transform type
def bucketFunc(id: Long): Int = bucketTransform.apply(id)
// create and register a UDF
val bucket16 = spark.udf.register("bucket16", bucketFunc _) Then you can use it like this: INSERT INTO table SELECT id, data FROM source ORDER BY bucket16(id) |
@rdblue Thanks, let me try it out |
@moulimukherjee, did that solve the problem? |
@rdblue Yes, it did. Thanks! |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Seeing following error while using the bucket partition
The relevant code looks like
Sorting by the column does not help as its bucketted using hash.
The text was updated successfully, but these errors were encountered: