Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Deterministic generations #27

Closed
philpax opened this issue Mar 16, 2023 · 4 comments
Closed

Deterministic generations #27

philpax opened this issue Mar 16, 2023 · 4 comments

Comments

@philpax
Copy link
Collaborator

philpax commented Mar 16, 2023

Given the same seed and prompt, the same text should be generated. This will require us to implement a deterministic PRNG (instead of using thread_rng), and to allow specifying a seed. This should also assist in benchmarking.

@nonnull-ca
Copy link

A lofty goal!

Be aware that under the hood llama (and indeed most ANNs) use floating-point, and floating-point determinism is a rabbit hole with no bottom.

Some particular issues that come to mind:

  • Subnormal handling (can be flushed to zero, or not).
  • Extended precision (intermediate values can be evaluated in higher precision, or not).
    • This can be done by the compiler, or in libraries.
    • E.g. it seems like ggml in some cases uses f32 for f16 evaluation
    • GCC and clang can force non-extended precision for floating-point ops... but only for named variables, not temporaries. I haven't seen any equivalent to do even that for Rust.
  • Transcendental functions in general (can vary slightly in different implementations).
    • E.g. ggml uses the host tanh / etc.
  • FPU mode bits in general (e.g. rounding modes).
    • This is threadlocal state, and can be affected by things like 'what shared libraries are injected'.
    • Famously, at one point there was a printer driver that was clobbering FPU state - so if you opened a file picker in Windows your FPU results would then be different within that thread. Lovely.
  • Ordering within summations and other reductions
    • E.g. it seems like 4-wide and 8-wide SIMD implementations in ggml use different summation orderings.
    • E.g. it seems like dot product uses a different summation ordering for SIMD / non-SIMD.
  • Conversion between floats and integers (and vice versa)
    • Ties into rounding modes, above.

See also e.g. rust-lang/unsafe-code-guidelines#237.

All told: it's doable, with a fair bit of effort, and has been done before (look up lockstep networking for games - much the same issue). Just be aware that it's not a trivial task, especially if you demand determinism between different machines, not just between different compiles on the same machine.

@setzer22
Copy link
Collaborator

I would say, for the time being, determinism given the same hardware and compiler version is a good enough goal. Going beyond that and trying to make things deterministic across different kinds of hardware is probably going to negatively affect performance.

@philpax
Copy link
Collaborator Author

philpax commented Mar 17, 2023

Aye, agreed with setzer - the primary thing I want is to be able to specify the same parameters on the same machine and get the same results. We can think about offering a "fully deterministic" mode later, but as you've mentioned madness lies that way.

@setzer22
Copy link
Collaborator

I think this can be closed since the rest of the work to get determinism is out-of-scope for now 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants