Skip to content

Issues: IBM/data-prep-kit

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[Feature]New DPK transform to get the distributions of quality metrics enhancement New feature or request
#1045 opened Feb 11, 2025 by Hajar-Emami
1 of 2 tasks
[Bug] Pdf2parquet inbuilt ocr error bug Something isn't working
#1042 opened Feb 11, 2025 by ShiroYasha18
1 of 2 tasks
[Feature] Enable crawling of websites that require credentials via SSO or 2FA enhancement New feature or request
#1040 opened Feb 11, 2025 by touma-I
1 of 2 tasks
[Bug] Error running lang_id and code_quality kfp pipelines bug Something isn't working
#1038 opened Feb 11, 2025 by revit13
1 of 2 tasks
[Bug] fix website: ibm.github.io/data-prep-kit bug Something isn't working
#1037 opened Feb 10, 2025 by sujee
2 tasks done
Rep_removal for large data files crashes on 16GB memory bug Something isn't working
#1035 opened Feb 10, 2025 by shahrokhDaijavad
1 of 2 tasks
[Feature] Update PII sample notebook to use simple APIs enhancement New feature or request
#1032 opened Feb 10, 2025 by sujee
2 tasks done
On-boarding Multimodal and Multi-lingual transforms to DPK enhancement New feature or request
#1020 opened Feb 6, 2025 by shahrokhDaijavad
1 of 2 tasks
Improve performance of the Readability transform enhancement New feature or request
#1015 opened Feb 5, 2025 by shahrokhDaijavad
1 of 2 tasks
Tokenizing parquet files to arrow tables enhancement New feature or request
#1009 opened Feb 4, 2025 by shahrokhDaijavad
1 of 2 tasks
Simplification of how users interact with the rep_removal transform enhancement New feature or request
#1007 opened Jan 31, 2025 by shahrokhDaijavad
1 of 2 tasks
[Bug] Update lang_id readme file with list of languages that it supports bug Something isn't working
#1005 opened Jan 31, 2025 by touma-I
1 of 2 tasks
[Feature] how to find which DPK 'modules' are installed enhancement New feature or request
#996 opened Jan 29, 2025 by sujee
1 of 2 tasks
[Bug] Unable to access quay.io/dataprep1/data-prep-kit/doc_chunk-ray:latest bug Something isn't working
#995 opened Jan 29, 2025 by touma-I
2 tasks done
[Bug] Web2parquet fails on Windows bug Something isn't working
#990 opened Jan 28, 2025 by touma-I
1 of 2 tasks
[Bug] FDedup Fails on Windows bug Something isn't working
#989 opened Jan 28, 2025 by touma-I
1 of 2 tasks
[Bug] Wrong Ray cluster name bug Something isn't working
#988 opened Jan 28, 2025 by roytman
1 of 2 tasks
[Bug] The S3 secret name is hardcoded in the KFP library bug Something isn't working
#985 opened Jan 28, 2025 by roytman
2 tasks done
[Bug] FDedup failing with latest release mmh3==5.1.0 bug Something isn't working
#982 opened Jan 27, 2025 by touma-I
1 of 2 tasks
Bloom annotator implementation for GneissWeb data enhancement New feature or request sprint-feb-7
#981 opened Jan 27, 2025 by shahrokhDaijavad
2 tasks done
ProTip! Mix and match filters to narrow down what you’re looking for.