You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Component
Transforms/Other
Feature
In the GneissWeb recipe, after the rep-removal transform and before using the extreme-tokenizer transform, we need to convert parquet files to arrow tables. It seems this work was one internally as a transform by @santoshborse called tokenization2arrow. @santoshborse Is it possible for you to bring your transform to the open repo?
Search before asking
Component
Transforms/Other
Feature
In the GneissWeb recipe, after the rep-removal transform and before using the extreme-tokenizer transform, we need to convert parquet files to arrow tables. It seems this work was one internally as a transform by @santoshborse called tokenization2arrow.
@santoshborse Is it possible for you to bring your transform to the open repo?
cc: @touma-I @Hajar-Emami
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: