Skip to content

Commit

Permalink
Improve T5 encoder tests with more prompts and static context length
Browse files Browse the repository at this point in the history
The set of prompts is not big enough for statistically sound testing of
the T5 encoder. This is true for other text encoders.
With the expansion of the prompt set the bf16 numerical difference
between eager and IREE vanished. IREE is even more accurate.

In tests the tokenizer padding has been change to produce always max
length token sequence. This is in line how T5 is used int the Flux
pipeline. The T5 encoder export has been expanded with an option to
export with a static token sequence length.

The tests were refactored to share tolerance values for f32 and bf16.
  • Loading branch information
sogartar committed Feb 17, 2025
1 parent e39df9b commit 7226258
Show file tree
Hide file tree
Showing 3 changed files with 96 additions and 98 deletions.
38 changes: 21 additions & 17 deletions sharktank/sharktank/models/t5/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ def export_encoder_mlir(
model: Union[T5Encoder, Path, str],
batch_sizes: list[int],
mlir_output_path: str,
dynamic_context_length: bool = True,
):
"""
Args:
Expand All @@ -44,23 +45,26 @@ def export_encoder_mlir(
for batch_size in batch_sizes:
sample_inputs = model.sample_inputs(batch_size)

context_length_dim_idx = 1
assert (
sample_inputs["input_ids"].shape[context_length_dim_idx]
% config.context_length_padding_block_size
== 0
)
context_length_block_dim_max = (
sample_inputs["input_ids"].shape[context_length_dim_idx]
// config.context_length_padding_block_size
)
context_length_block_dim = torch.export.Dim(
"block", max=context_length_block_dim_max
)
context_length_dim = (
config.context_length_padding_block_size * context_length_block_dim
)
dynamic_shapes = {"input_ids": {context_length_dim_idx: context_length_dim}}
if dynamic_context_length:
context_length_dim_idx = 1
assert (
sample_inputs["input_ids"].shape[context_length_dim_idx]
% config.context_length_padding_block_size
== 0
)
context_length_block_dim_max = (
sample_inputs["input_ids"].shape[context_length_dim_idx]
// config.context_length_padding_block_size
)
context_length_block_dim = torch.export.Dim(
"block", max=context_length_block_dim_max
)
context_length_dim = (
config.context_length_padding_block_size * context_length_block_dim
)
dynamic_shapes = {"input_ids": {context_length_dim_idx: context_length_dim}}
else:
dynamic_shapes = None

@fxb.export_program(
name=f"forward_bs{batch_size}",
Expand Down
4 changes: 4 additions & 0 deletions sharktank/sharktank/utils/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,4 +344,8 @@ def decorator(test_item: Callable):
"The horse went into the river",
"We need at least one sentence long enough so that it spans more than one padding block which by default is of size 16.",
"Make the batch size 4",
"In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided by the product of their lengths. It follows that the cosine similarity does not depend on the magnitudes of the vectors, but only on their angle.",
'Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.',
"A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.\nThe largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in",
"Predictive learning is a machine learning (ML) technique where an artificial intelligence model is fed new data to develop an understanding of its environment, capabilities, and limitations. This technique finds application in many areas, including neuroscience, business, robotics, and computer vision. This concept was developed and expanded by French computer scientist Yann LeCun in 1988 during his career at Bell Labs, where he trained models to detect handwriting so that financial companies could automate check processing.",
]
Loading

0 comments on commit 7226258

Please sign in to comment.