使用Decoder-only的Transformer进行时序预测,包含SwiGLU和RoPE(Rotary Positional Embedding),Time series prediction using Decoder-only Transformer, Including SwiGLU and RoPE(Rotary Positional Embedding)
-
Updated
Jan 25, 2024 - Python
使用Decoder-only的Transformer进行时序预测,包含SwiGLU和RoPE(Rotary Positional Embedding),Time series prediction using Decoder-only Transformer, Including SwiGLU and RoPE(Rotary Positional Embedding)
Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.
Add a description, image, and links to the swiglu topic page so that developers can more easily learn about it.
To associate your repository with the swiglu topic, visit your repo's landing page and select "manage topics."