Transformer Language Modeling Notebook showing how to implement a decoder transformer model and train it for language modeling on the WebText dataset. Note: Work in progress!