coremltools 8.0b2
Pre-release
Pre-release
Release Notes
- Support for Latest Dependencies
- Compatible with the latest
protobuf
python package: Improves serialization latency. - Compatible with
numpy 2.0
. - Supports
scikit-learn 1.5
.
- Compatible with the latest
- New Core ML model utils
coremltools.models.utils.bisect_model
can break a large Core ML model into two smaller models with similar sizes.coremltools.models.utils.materialize_dynamic_shape_mlmodel
can convert a flexible input shape model into a static input shape model.
- New compression features in
coremltools.optimize.coreml
- Vector palettization: By setting
cluster_dim > 1
incoremltools.optimize.coreml.OpPalettizerConfig
, you can do the vector palettization, where each entry in the lookup table is a vector of lengthcluster_dim
. - Palettization of per channel scale: By setting
enable_per_channel_scale=True
incoremltools.optimize.coreml.OpPalettizerConfig
, weights are normalized along the output channel using per channel scales before being palettized. - Joint compression: A new pattern is supported, where weights are first quantized to int8 and then palettized into n-bit look-up table with int8 entries.
- Support conversion of palettized model with 8bits LUT produced from
coremltools.optimize.torch
.
- Vector palettization: By setting
- New compression features / bug fixes in
coremltools.optimize.torch
- Added conversion support for Torch models jointly compressed using the training time APIs in
coremltools.optimize.torch
. - Added vector palettization support to
SKMPalettizer
. - Fixed bug in construction of weight vectors along output channel for vector palettization with
PostTrainingPalettizer
andDKMPalettizer
. - Deprecated
cluter_dtype
option in favor oflut_dtype
inModuleDKMPalettizerConfig
. - Added support for quantizing
ConvTranspose
modules withPostTrainingQuantizer
andLinearQuantizer
. - Added static grouping for activation heuristic in
GPTQ
. - Fixed bug in how quantization scales are computed for
Conv2D
layer with per-block quantization inGPTQ
. - Can now perform activation only quantization with
QAT
APIs.
- Added conversion support for Torch models jointly compressed using the training time APIs in
- Experimental
torch.export
conversion support- Support conversion of stateful models with mutable buffer.
- Support conversion of dynamic inputs shape models.
- Support conversion of 4-bit weight compression models.
- Support new torch ops:
clip
. - Various other bug fixes, enhancements, clean ups and optimizations.
- Special thanks to our external contributors for this release: @dpanshu , @timsneath , @kasper0406 , @lamtrinhdev , @valfrom
Appendix
- Example code of converting stateful
torch.export
model
import torch
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.register_buffer("state_1", torch.tensor([0.0, 0.0, 0.0]))
def forward(self, x):
# In place update of the model state
self.state_1.mul_(x)
return self.state_1 + 1.0
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([1.0, 2.0, 3.0]),)
exported_model = torch.export.export(source_model, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS18)
- Example code of converting
torch.export
models with dynamic input shapes
import torch
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(3, 5)
def forward(self, x):
y = self.linear(x)
return y
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
dynamic_shapes = {"x": {0: torch.export.Dim(name="batch_dim")}}
exported_model = torch.export.export(source_model, example_inputs, dynamic_shapes=dynamic_shapes)
coreml_model = ct.convert(exported_model)
- Example code of converting
torch.export
with 4-bit weight compression
import torch
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
XNNPACKQuantizer,
get_symmetric_quantization_config,
)
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(3, 5)
def forward(self, x):
y = self.linear(x)
return y
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
pre_autograd_graph = capture_pre_autograd_graph(source_model, example_inputs)
quantization_config = get_symmetric_quantization_config(weight_qmin=-7, weight_qmax=8)
quantizer = XNNPACKQuantizer().set_global(quantization_config)
prepared_graph = prepare_pt2e(pre_autograd_graph, quantizer)
converted_graph = convert_pt2e(prepared_graph)
exported_model = torch.export.export(converted_graph, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS17)