EasyDeL 0.0.80 brings enhanced flexibility, expanded model support, and improved performance with the introduction of vInference and optimized GPU/TPU integration. This version offers a significant speed and performance boost, with benchmarks showing improvements of over 4.9%, making EasyDeL more dynamic and easier to work with.
New Features:
- Platform and Backend Flexibility: Users can now specify the platform (e.g., TRITON) and backend (e.g., GPU) to optimize their workflows.
Expanded Model Support: We have added support for new models including olmo2, qwen2_moe, mamba2, and others, enhancing the tool's versatility. - Enhanced Trainers: Trainers are now more customizable and hackable, providing greater flexibility for project-specific needs.
- New Trainer Types: Introduced sequence-to-sequence trainers and sequence classification trainers to support a wider range of training tasks.
- vInference Engine: A robust inference engine for LLMs with Long-Term Support (LTS), ensuring stability and reliability.
- vInferenceApiServer: A backend for the inference engine that is fully compatible with OpenAI APIs, facilitating easy integration.
- Optimized GPU Integration: Leverages custom, direct TRITON calls for improved GPU performance, speeding up processing times.
- Dynamic Quantization Support: Added support for quantization types NF4, A8BIT, A8Q, and A4Q, enabling efficiency and scalability.
Performance Improvements:
- EasyDeL 0.0.80 has been optimized for speed and performance, with benchmarks showing improvements of over 4.9% compared to previous versions.
- The tool is now more dynamic and easier to work with, enhancing the overall user experience.
This release is a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.
Documentation:
Comprehensive documentation is available at https://easydel.readthedocs.io/en/latest/
Example Usage:
Load any of the 40+ available models with EasyDeL:
sharding_axis_dims = (1, 1, 1, -1) # sequence sharding for better inference and training
max_length = 2**15
pretrained_model_name_or_path = "AnyEasyModel"
dtype = jnp.float16
model, params = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
pretrained_model_name_or_path,
input_shape=(len(jax.devices()), max_length),
auto_shard_params=True,
sharding_axis_dims=sharding_axis_dims,
config_kwargs=EasyDeLBaseConfigDict(
use_scan_mlp=False,
attn_dtype=jnp.float16,
freq_max_position_embeddings=max_length,
mask_max_position_embeddings=max_length,
attn_mechanism=ed.AttentionMechanisms.VANILLA,
kv_cache_quantization_method=ed.EasyDeLQuantizationMethods.A8BIT,
use_sharded_kv_caching=False,
gradeint_checkpointing=ed.EasyDeLGradientCheckPointers.NONE,
),
quantization_method=ed.EasyDeLQuantizationMethods.NF4,
quantization_block_size=256,
platform=ed.EasyDeLPlatforms.TRITON,
partition_axis=ed.PartitionAxis(),
param_dtype=dtype,
dtype=dtype,
precision=lax.Precision("fastest"),
)
This release marks a significant advancement in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.
Note
This might be the last release of EasyDeL that incorporates HF/Flax modules. In future versions, EasyDeL will transition to its own base
modules and may adopt Equinox or Flax NNX, provided that NNX meets sufficient performance standards. Users are encouraged to
provide feedback on this direction.
This release represents a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We
look forward to your feedback and continued support.