Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster perplexity computation #2786

Merged
merged 1 commit into from
Aug 25, 2023
Merged

Faster perplexity computation #2786

merged 1 commit into from
Aug 25, 2023

Conversation

ikawrakow
Copy link
Contributor

@ikawrakow ikawrakow commented Aug 25, 2023

Time to compute 7B perplexity with 512 context of Wikitext on an RTX-4080 goes from 143 seconds to 128 seconds. I guess, not a big deal for most people, but a nice speedup for someone like me who runs lots of perplexity calculations while trying different quantization techniques.

Also added output of the statistical uncertainty of the computed perplexity. With --ppl-output-type 1 the perplexity tool now outputs 4 columns, which are number of evaluated tokens, perplexity, average negative log probability and its uncertainty.

@ikawrakow ikawrakow merged commit d046dce into master Aug 25, 2023
@ikawrakow ikawrakow deleted the ik/faster_ppl branch August 25, 2023 16:05
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggerganov#2802)
  convert.py : advanced option (ggerganov#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814)
  flake.nix : add rocm support and cleanup (ggerganov#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528)
  llama : use std::abs in llama_sample_tail_free (ggerganov#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807)
  Fix HellaSwag (ggerganov#2805)
  flake : build llama.cpp on Intel with nix (ggerganov#2795)
  Handle null rope scaling value (ggerganov#2793)
  Fix spm whitespaces (ggerganov#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804)
  llama : fix struct decl (ggerganov#2790)
  Faster perplexity computation (ggerganov#2786)
  llama : add llama_beam_search() (ggerganov#2267)
  convert.py : Get rope scale from HuggingFace models (ggerganov#2772)
  llama-bench : add model sizes (ggerganov#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773)
  ...
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants