v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16 (#398)
- Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)
Inference
- Improve the support of sentence transformers by @JingyaHuang (#408)
- Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
- Add support for Mistral models by @dacorvo (#411)
- Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)
Training
- Add general support for generation on TRN with NxD by @aws-tianquaw (#370)
Tutorials and doc improvement
- Add llama 2 fine tuning tutorial by @philschmid (#390)
Major bugfixes
- Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)
Other changes
- Bump Hugging Face library versions by @JingyaHuang (#403)
New Contributors
- @aws-tianquaw made their first contribution in #370
- @aws-yishanm made their first contribution in #387
Full Changelog: v0.0.16...v0.0.17