-
Notifications
You must be signed in to change notification settings - Fork 165
Issues: IBM/data-prep-kit
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]New DPK transform to get the distributions of quality metrics
enhancement
New feature or request
#1045
opened Feb 11, 2025 by
Hajar-Emami
1 of 2 tasks
[Feature] Filter both the parquet and arrow files and update the metadata simultaneously
enhancement
New feature or request
#1044
opened Feb 11, 2025 by
Hajar-Emami
1 of 2 tasks
[Bug] Pdf2parquet inbuilt ocr error
bug
Something isn't working
#1042
opened Feb 11, 2025 by
ShiroYasha18
1 of 2 tasks
[Feature] Enable crawling of websites that require credentials via SSO or 2FA
enhancement
New feature or request
#1040
opened Feb 11, 2025 by
touma-I
1 of 2 tasks
[Bug] Error running lang_id and code_quality kfp pipelines
bug
Something isn't working
#1038
opened Feb 11, 2025 by
revit13
1 of 2 tasks
[Bug] fix website: ibm.github.io/data-prep-kit
bug
Something isn't working
#1037
opened Feb 10, 2025 by
sujee
2 tasks done
Rep_removal for large data files crashes on 16GB memory
bug
Something isn't working
#1035
opened Feb 10, 2025 by
shahrokhDaijavad
1 of 2 tasks
[Feature] Enabling gneissweb_classification transform by using multiple fasttext classifiers simultaneously
enhancement
New feature or request
#1034
opened Feb 10, 2025 by
Hajar-Emami
1 of 2 tasks
[Feature] Update PII sample notebook to use simple APIs
enhancement
New feature or request
#1032
opened Feb 10, 2025 by
sujee
2 tasks done
On-boarding Multimodal and Multi-lingual transforms to DPK
enhancement
New feature or request
#1020
opened Feb 6, 2025 by
shahrokhDaijavad
1 of 2 tasks
Improve performance of gneissweb_classification transform and test it with IBM GneissWeb models that are now in HuggingFace
enhancement
New feature or request
#1017
opened Feb 5, 2025 by
shahrokhDaijavad
1 of 2 tasks
Improve performance of the Readability transform
enhancement
New feature or request
#1015
opened Feb 5, 2025 by
shahrokhDaijavad
1 of 2 tasks
Tokenizing parquet files to arrow tables
enhancement
New feature or request
#1009
opened Feb 4, 2025 by
shahrokhDaijavad
1 of 2 tasks
Simplification of how users interact with the rep_removal transform
enhancement
New feature or request
#1007
opened Jan 31, 2025 by
shahrokhDaijavad
1 of 2 tasks
[Bug] Update lang_id readme file with list of languages that it supports
bug
Something isn't working
#1005
opened Jan 31, 2025 by
touma-I
1 of 2 tasks
Consistency of defined configuration parameters with the CLI Options in all transforms READMEs and Notebooks
enhancement
New feature or request
#1002
opened Jan 30, 2025 by
shahrokhDaijavad
2 tasks done
[Feature] how to find which DPK 'modules' are installed
enhancement
New feature or request
#996
opened Jan 29, 2025 by
sujee
1 of 2 tasks
[Bug] Unable to access quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest
bug
Something isn't working
#995
opened Jan 29, 2025 by
touma-I
2 tasks done
[Bug] Web2parquet fails on Windows
bug
Something isn't working
#990
opened Jan 28, 2025 by
touma-I
1 of 2 tasks
[Bug] FDedup Fails on Windows
bug
Something isn't working
#989
opened Jan 28, 2025 by
touma-I
1 of 2 tasks
[Bug] Wrong Ray cluster name
bug
Something isn't working
#988
opened Jan 28, 2025 by
roytman
1 of 2 tasks
[Bug] The S3 secret name is hardcoded in the KFP library
bug
Something isn't working
#985
opened Jan 28, 2025 by
roytman
2 tasks done
Develop a notebook that creates a pipeline (recipe) for running new GneissWeb transforms in sequence on some data of your choosing.
enhancement
New feature or request
gneiss web
sprint-Jan31
#983
opened Jan 27, 2025 by
shahrokhDaijavad
1 of 2 tasks
[Bug] FDedup failing with latest release mmh3==5.1.0
bug
Something isn't working
#982
opened Jan 27, 2025 by
touma-I
1 of 2 tasks
Bloom annotator implementation for GneissWeb data
enhancement
New feature or request
sprint-feb-7
#981
opened Jan 27, 2025 by
shahrokhDaijavad
2 tasks done
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.