v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

dacorvo released this 19 Jan 07:19

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16 (#398)
Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

Improve the support of sentence transformers by @JingyaHuang (#408)
Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
Add support for Mistral models by @dacorvo (#411)
Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Add general support for generation on TRN with NxD by @aws-tianquaw (#370)

Tutorials and doc improvement

Add llama 2 fine tuning tutorial by @philschmid (#390)

Major bugfixes

Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)

Other changes

Bump Hugging Face library versions by @JingyaHuang (#403)

New Contributors

@aws-tianquaw made their first contribution in #370
@aws-yishanm made their first contribution in #387

Full Changelog: v0.0.16...v0.0.17

Contributors

dacorvo, michaelbenayoun, and 4 other contributors

Assets 2