Skip to content

Issues: PlanTL-GOB-ES/corpus-cleaner

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Line breaks in biomedical corpus
#96 opened Feb 4, 2022 by jordiae
Port corpus-cleaner to Distify
#93 opened Nov 3, 2021 by jordiae
Verify (and fix) potential CPU overloading bug Something isn't working
#88 opened Jun 1, 2021 by asier-gutierrez
Master node's RAM allocation is larger bug Something isn't working
#87 opened Jun 1, 2021 by asier-gutierrez
RAM Memory Leak or Unexpected Memory Allocation bug Something isn't working
#86 opened Jun 1, 2021 by asier-gutierrez
BSC Crawl data parser: url = url & keywords = url? bug Something isn't working
#85 opened May 26, 2021 by asier-gutierrez
Error: "Processed Expecting value: line 1..." when processing large corpus. bug Something isn't working help wanted Extra attention is needed
#84 opened May 6, 2021 by asier-gutierrez
Check "Processed None(?) into OutputFormatterMapper" message bug Something isn't working help wanted Extra attention is needed
#83 opened May 4, 2021 by asier-gutierrez
Tackle distributed computing limitation with 50<x<600 nodes bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
#82 opened May 3, 2021 by asier-gutierrez
Accept PDF format enhancement New feature or request
#81 opened Feb 22, 2021 by onadegibert
Refactoring roadmap
#80 opened Feb 19, 2021 by jordiae
None filter not working bug Something isn't working
#79 opened Feb 19, 2021 by asier-gutierrez
OOM error while processing big file (150GB) bug Something isn't working
#77 opened Jan 21, 2021 by onadegibert
Roadmap v2 enhancement New feature or request
#76 opened Dec 17, 2020 by jordiae
Rethink deduplication
#75 opened Dec 2, 2020 by jordiae
Fix new deploy: few onions and slow speed bug Something isn't working
#73 opened Nov 26, 2020 by onadegibert
Improve encoding checking in WARC
#46 opened Aug 4, 2020 by jordiae
Improve code regex
#45 opened Aug 4, 2020 by jordiae
Improve logging
#43 opened Aug 4, 2020 by jordiae
Test pipeline with BETO's corpus
#41 opened Aug 4, 2020 by jordiae
Test pipeline with BNE
#40 opened Aug 4, 2020 by jordiae
Add proper testing
#39 opened Aug 4, 2020 by jordiae
ProTip! What’s not been updated in a month: updated:<2024-09-30.