Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove inappropriate words from your random text selections #1511

Closed
eklicious opened this issue Mar 15, 2024 · 3 comments
Closed

Remove inappropriate words from your random text selections #1511

eklicious opened this issue Mar 15, 2024 · 3 comments
Assignees
Labels
enhancement Enhancement proposals

Comments

@eklicious
Copy link

Feature request

Go through all your data sets and remove inappropriate words.

Thesis

For example, text.json contains words like 'milf' and 'milfhunter'. Those need to be removed because customers end up seeing this in their sample data sets and this doesn't make anyone look good for anyone.

Reasoning

If you want companies using your tool, you need to cleanse the data.

@lk-geimfari
Copy link
Owner

I completely agree with this. The problem is that this data was collected all over the internet and not by me alone, and obviously I haven't seen all the data and verified it. It's also worth noting that this kind of data got there by accident.

I take this problem seriously. Fixing it will be a top priority for the next release.

@lk-geimfari lk-geimfari added the enhancement Enhancement proposals label Mar 15, 2024
@lk-geimfari lk-geimfari self-assigned this Mar 15, 2024
@lk-geimfari
Copy link
Owner

Well, I removed everything I found using: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master

I hope this will improve the quality of the datasets and there won't be bad words in them, but I can't guarantee it because I can't check all the datasets, word by word. Can't do it physically.

@lk-geimfari
Copy link
Owner

Version 16.0.0 with fixes is available now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement proposals
Projects
None yet
Development

No branches or pull requests

2 participants