Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up a Rust-first repo structure #4

Closed
wants to merge 750 commits into from
Closed

Set up a Rust-first repo structure #4

wants to merge 750 commits into from

Conversation

ErikKaum
Copy link
Collaborator

@ErikKaum ErikKaum commented Aug 15, 2024

PR still no more in draft. The main goal of this PR is to have:

  • a repo structure that we're all happy with
  • make sure the bindings & integration of the rust crate work
  • a way to build the python package, both for local dev and then a distribution build*

*the build process is not necessarily the most elegant one. We're between two choices matruin vs. setuptools-rust.

  • matruin doesn't support SCM-based versioning, but is in general easier to use. maturin build "just works"
  • setuptools-rust allows us to use SCM-based versioning but requires more custom scripts. Especially, it doesn't play well with dependencies from parent directories. The solution is to create a symlink & edit Cargo.toml --> then build

note: CI is currently not working with the new setup, I propose to create a new issue+PR where we address that separately.

rlouf and others added 30 commits February 20, 2024 10:20

---------

Co-authored-by: Andrew Lapp <andrew@rew.la>
Release Docker dispatch:
https://github.com/lapp0/outlines/actions/runs/7994419887
- "Fails successfully": Got to the point where it only auth errors.
`Error: buildx failed with: ERROR: denied: requested access to the
resource is denied`

Not testing fetch PyPi. Changes are minimal between this version and the
previous *working* main.

```
git diff e99d92d -- .github/workflows/release_pypi.yaml .github/workflows/release.yml | cat
diff --git a/.github/workflows/release.yml b/.github/workflows/release_pypi.yaml
similarity index 92%
rename from .github/workflows/release.yml
rename to .github/workflows/release_pypi.yaml
index e6bf1b1..597ebb7 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release_pypi.yaml
@@ -1,16 +1,17 @@
-name: Release
+name: Release PyPi
 
 on:
   release:
     types:
       - created
-
 jobs:
   release-job:
     name: Build and publish on PyPi
     runs-on: ubuntu-latest
+    environment: release
     steps:
-    - uses: actions/checkout@v2
+    - name: Checkout
+      uses: actions/checkout@v2
     - name: Set up Python
       uses: actions/setup-python@v2
       with:
```

---------

Co-authored-by: Andrew Lapp <andrew@rew.la>
We currently store the logits processor in the `LlamaCpp` instance. This
causes issues when doing successive generations with different
generators. In this PR we create a new `LlamaSequenceGenerator` instance
every time we create a new generator, and store the logits processor in
this instance which solves the issue.

Fixes #700.
When integrating Outlines with vLLM I faced the following issues, which
are fixed in this PR:

1. When calling `vllm.LLM.generate` then within the internals of vLLM a
`copy.deepcopy` of the vLLM `SamplingParams` is made, which includes the
logits processor from Outlines (`RegexLogitsProcessor`, say). This
requires everything to be pickleable, and the
`RegexLogitsProcessor.fsm.vocabulary` is a `dict_values` object, which
doesn't satisfy that. The fix is easy: just convert it to a list. This
doesn't affect how this `vocabulary` variable is being used in the code.
2. The `RegexLogitsProcessor` takes an `llm` argument, which the
docstring states should be a `vllm.LLM` object, but then attempts to
extract the underlying tokenizer via `llm.tokenizer.tokenizer`. The
tokenizer of `vllm.LLM` currently lies in the
`vllm.LLM.llm_engine.tokenizer.tokenizer` attribute, but this is a big
mess and isn't backwards compatible with previous vLLM versions.
Instead, they have a convenience method, `vllm.LLM.get_tokenizer`, which
fetches the tokenizer. To remain backwards compatibility, in case people
have supplied `vllm.LLM.llm_engine` directly into
`RegexLogitsProcessor`, it falls back to a `tokenizer` or
`tokenizer.tokenizer` attribute.

I also updated the vLLM example script, as that was outdated as well
(used the previous `_patched_apply_logits_processors`).

Closes #704
A recent change replaced the set of FSM final states with the state -1
that is used to represent an EOS token being generated. This could
explain the issue reported in #605.
…pported by exllama (#729)

Refactored the exl2 function in exllamav2.py.

The new version offers the following benefits:
1. auto split support. You no longer need to split a large model over 2
GPUs manually, exllama will do it for you
2. 8 bit cache support. Supports the 8 bit cache, can squeeze more
context into the same GPU
3. Additional exllamav2 improvements. Supports low_mem, fasttensors.
4. No longer need to pass in num_experts, it is optional.
5. Future support for 4 bit cache. Whenever turbo updates the pip
package, uncomment the 4 bit lines for 4 bit support.
6. Refactored the function parameters. Changed the model_kwargs
dictionary to individual parameters. Combined with documentation this
makes it easier for new users to understand what options they can
select.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.