Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar, punctuation, and typo fixes #159

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Easily train your own text-generating neural network of any size and complexity

textgenrnn is a Python 3 module on top of [Keras](https://github.com/fchollet/keras)/[TensorFlow](https://www.tensorflow.org) for creating [char-rnn](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)s, with many cool features:

* A modern neural network architecture which utilizes new techniques as attention-weighting and skip-embedding to accelerate training and improve model quality.
* A modern neural network architecture which utilizes new techniques such as attention-weighting and skip-embedding to accelerate training and improve model quality.
* Train on and generate text at either the character-level or word-level.
* Configure RNN size, the number of RNN layers, and whether to use bidirectional RNNs.
* Train on any generic input text file, including large files.
Expand Down Expand Up @@ -40,7 +40,7 @@ textgen.generate()
Project State Project Firefox
```

The model weights are relatively small (2 MB on disk), and they can easily be saved and loaded into a new textgenrnn instance. As a result, you can play with models which have been trained on hundreds of passes through the data. (in fact, textgenrnn learns *so well* that you have to increase the temperature significantly for creative output!)
The model weights are relatively small (2 MB on disk), and they can easily be saved and loaded into a new textgenrnn instance. As a result, you can play with models which have been trained on hundreds of passes through the data. In fact, textgenrnn learns *so well* that you have to increase the temperature significantly for creative output!

```python
textgen_2 = textgenrnn('/weights/hacker_news.hdf5')
Expand Down Expand Up @@ -72,7 +72,7 @@ textgen.generate(interactive=True, top_n=5)

![word_level_demo](/docs/textgenrnn_interactive.gif)

This can add a *human touch* to the output; it feels like you're the writer! ([reference](https://fivethirtyeight.com/features/some-like-it-bot/))
This can add a *human touch* to the output; it feels like you're the writer! ([reference](https://fivethirtyeight.com/features/some-like-it-bot/)).

## Usage

Expand All @@ -98,11 +98,11 @@ textgenrnn is based off of the [char-rnn](https://github.com/karpathy/char-rnn)

![default model](/docs/default_model.png)

The included pretrained-model follows a [neural network architecture](https://github.com/bfelbo/DeepMoji/blob/master/deepmoji/model_def.py) inspired by [DeepMoji](https://github.com/bfelbo/DeepMoji). For the default model, textgenrnn takes in an input of up to 40 characters, converts each character to a 100-D character embedding vector, and feeds those into a 128-cell long-short-term-memory (LSTM) recurrent layer. Those outputs are then fed into *another* 128-cell LSTM. All three layers are then fed into an Attention layer to weight the most important temporal features and average them together (and since the embeddings + 1st LSTM are skip-connected into the attention layer, the model updates can backpropagate to them more easily and prevent vanishing gradients). That output is mapped to probabilities for up to [394 different characters](/textgenrnn/textgenrnn_vocab.json) that they are the next character in the sequence, including uppercase characters, lowercase, punctuation, and emoji. (if training a new model on a new dataset, all of the numeric parameters above can be configured)
The included pretrained-model follows a [neural network architecture](https://github.com/bfelbo/DeepMoji/blob/master/deepmoji/model_def.py) inspired by [DeepMoji](https://github.com/bfelbo/DeepMoji). For the default model, textgenrnn takes in an input of up to 40 characters, converts each character to a 100-D character embedding vector, and feeds those into a 128-cell long-short-term-memory (LSTM) recurrent layer. Those outputs are then fed into *another* 128-cell LSTM. All three layers are then fed into an Attention layer to weight the most important temporal features and average them together (and since the embeddings + 1st LSTM are skip-connected into the attention layer, the model updates can backpropagate to them more easily and prevent vanishing gradients). The output of the attention layer is then mapped to probabilities for up to [394 different characters](/textgenrnn/textgenrnn_vocab.json) that are the next character in the sequence, including uppercase characters, lowercase, punctuation, and emoji. If training a new model on a new dataset, all of the numeric parameters above can be configured.

![context model](/docs/context_model.png)

Alternatively, if context labels are provided with each text document, the model can be trained in a contextual mode, where the model learns the text *given the context* so the recurrent layers learn the *decontextualized* language. The text-only path can piggy-back off the decontextualized layers; in all, this results in much faster training and better quantitative and qualitative model performance than just training the model gien the text alone.
Alternatively, if context labels are provided with each text document, the model can be trained in a contextual mode, where the model learns the text *given the context* so the recurrent layers learn the *decontextualized* language. The text-only path can piggy-back off the decontextualized layers; in all, this results in much faster training and better quantitative and qualitative model performance than just training the model given the text alone.

The model weights included with the package are trained on hundreds of thousands of text documents from Reddit submissions ([via BigQuery](http://minimaxir.com/2015/10/reddit-bigquery/)), from a very *diverse* variety of subreddits. The network was also trained using the decontextual approach noted above in order to both improve training performance and mitigate authorial bias.

Expand All @@ -120,17 +120,17 @@ Additionally, the retraining is done with a momentum-based optimizer and a linea

## Future Plans for textgenrnn

* More formal documentation
* More formal documentation.

* A web-based implementation using tensorflow.js (works especially well due to the network's small size)
* A web-based implementation using tensorflow.js (works especially well due to the network's small size).

* A way to visualize the attention-layer outputs to see how the network "learns."
* A way to visualize the attention-layer outputs to see how the network "learns".

* A mode to allow the model architecture to be used for chatbot conversations (may be released as a separate project)
* A mode to allow the model architecture to be used for chatbot conversations (may be released as a separate project).

* More depth toward context (positional context + allowing multiple context labels)
* More depth toward context (positional context + allowing multiple context labels).

* A larger pretrained network which can accommodate longer character sequences and a more indepth understanding of language, creating better generated sentences.
* A larger pretrained network which can accommodate longer character sequences and a more in-depth understanding of language, creating better generated sentences.

* Hierarchical softmax activation for word-level models (once Keras has good support for it).

Expand Down