Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large rework of how punctuation is handled #16

Merged
merged 4 commits into from
Jul 27, 2021

Conversation

tomaarsen
Copy link
Owner

@tomaarsen tomaarsen commented Jun 22, 2021

  • Punctuation (commas, dots, apostrophes, etc.) are now split as a separate word.
  • Update the database accordingly. Create a ..._backup.db file with the old database.
  • Automatically update the settings.json file
  • Further improved code quality
  • Add a Tokenizer file that handles splitting up a sentence into tokens, and merging tokens back into sentences.
  • Added a SentenceSeparator value, which is placed inbetween sentences, when multiple sentences are generated (only when the first sentence was too short according to MinSentenceWordAmount)

Left to do:

  • Get the README back up to speed.

- Punctuation (commas, dots, apostrophes, etc.) are now split as a separate word.
- Update the database accordingly. Create a ..._backup.db file with the old database.
- Automatically update the settings.json file
- Further improved code quality
- Add a Tokenizer file that handles splitting up a sentence into tokens, and merging tokens back into sentences.
- Added a SentenceSeparator value, which is placed inbetween sentences, when multiple sentences are generated (only when the first sentence was too short according to MinSentenceWordAmount)
@tomaarsen
Copy link
Owner Author

tomaarsen commented Jul 27, 2021

README was updated, merging with updater.

@tomaarsen tomaarsen merged commit e6addd5 into updater_1 Jul 27, 2021
@tomaarsen tomaarsen deleted the updater_1_punctuation branch July 27, 2021 11:32
@tomaarsen tomaarsen restored the updater_1_punctuation branch July 27, 2021 11:32
@tomaarsen tomaarsen deleted the updater_1_punctuation branch July 27, 2021 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant