Releases: VectorInstitute/kaleidoscope
Releases · VectorInstitute/kaleidoscope
Kaleidoscope v0.12.1
What's Changed
- Added Llama3.1 model (8b and 8b_instruct variants)
- Added Phi-3 model (medium_128k_instruct variant)
Kaleidoscope v0.12.0
What's Changed
- Llama3 integration
- Focus on fast infererence
- Removed activations and interpretability features
Kaleidoscope v0.11.1
What's Changed
- Llama 2 now returns complete activations for batched activations retrieval, with correct dimensions for each prompt
Kaleidoscope v0.11.0
What's Changed
- Fixed remaining performance issues with Triton-hosted models
- Implemented a Stable Diffusion image generation model (SDXL), which takes a prompt as input and returns a base-32 encoded png as output
Kaleidoscope v0.10.2
What's Changed
- Removed OPT and GPT-J models
- Updated Slurm submit scripts to include allocation times
- Refactored module names config so these get set at the model variant level, not the full model type
- Refactored Falcon to use a single config file, instead of split across multiple config files
- Added base Llama2 variants, renamed chat variants to llama2-#b_chat
Kaleidoscope v0.10.0
- Fixed generation bug in Llama2
- Removed batching loop from Llama2 code, all batching is now handled by Triton
- Added missing config parameter to env-example
- Fixed Llama2 module layer names
- Set Docker restart policy to always
Kaleidoscope v0.9.0
What's Changed
- Added llama2 7b model
- Implemented activation retrieval and manipulation for llama2
- Removed original llama(1) models
- Fixed bug in opt models where generations fail with multiple prompts
Full Changelog: v0.8.0...v0.9.0
Kaleidoscope v0.8.0
What's Changed
- Added llama 7b and 30b models
- Improved error messages
- Improved stability of loading models
Important Notes
- This release is not backwards compatible with older versions of the kaleidoscope-sdk! You'll need to update:
python3 -m pip install --force-install kscope==0.8.0
Kaleidoscope v0.7.0
What's changed
- Implemented Pytriton backend (https://github.com/triton-inference-server/pytriton). This will allow us to greatly optimize model performance, also onboard new models much more rapidly going forward.
- Added falcon-7b and falcon-40b models.
Important Note
The new Pytriton backend introduces some minor changes to the programmatic API. This will break existing code for interfacing with the Kaleidoscope service. For example:
- Model names are now lowercased, so when calling
load_model()
you'll need to useopt-175b
instead ofOPT-175B
- Response text coming from
generate()
requests will now be accessed atgeneration['sequences']
instead ofgeneration['text']
- Invalid generation arguments will now fail with an error, instead of being silently ignored
Kaleidoscope v0.6.0
What's Changed
- Implemented activation manipulation for the OPT class of models
- Gateway service now hosted from a gunicorn/nginx wrapper
- Significant code cleanup, linting, docstrings