Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ModernBERT to Transformers #35158

Merged
merged 91 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
6b5a823
initial cut of modernbert for transformers
warner-benjamin Dec 9, 2024
dafb203
small bug fixes
warner-benjamin Dec 10, 2024
df13def
fixes
warner-benjamin Dec 11, 2024
d09eabf
Update import
tomaarsen Dec 11, 2024
8c3afea
Use compiled mlp->mlp_norm to match research implementation
tomaarsen Dec 11, 2024
a40aaa9
Propagate changes in modular to modeling
tomaarsen Dec 11, 2024
9f0b8ca
Replace duplicate attn_out_dropout in favor of attention_dropout
tomaarsen Dec 11, 2024
900d8ec
Update BOS to CLS and EOS to SEP
tomaarsen Dec 11, 2024
caf8901
Set default classifier bias to False, matching research repo
tomaarsen Dec 11, 2024
8276602
Update tie_word_embeddings description
tomaarsen Dec 11, 2024
79e4bbb
Fix _init_weights for ForMaskedLM
tomaarsen Dec 11, 2024
b59bad9
Match base_model_prefix
tomaarsen Dec 11, 2024
e7bef53
Add compiled_head to match research repo outputs
tomaarsen Dec 11, 2024
120578b
Fix imports for ModernBertForMaskedLM
tomaarsen Dec 11, 2024
142ff11
Just use "gelu" default outright for classifier
tomaarsen Dec 11, 2024
b44abdc
Fix config name typo: initalizer -> initializer
tomaarsen Dec 11, 2024
3de8ebf
Remove some unused parameters in docstring. Still lots to edit there!
tomaarsen Dec 11, 2024
7a05b3f
Compile the embeddings forward
tomaarsen Dec 12, 2024
88b0ecf
Add drafts for ForSequenceClassification/ForTokenClassification
tomaarsen Dec 12, 2024
5e3d61d
Add initial SDPA support (not exactly equivalent to FA2 yet!)
tomaarsen Dec 12, 2024
2a3d378
Only use attention dropout if training
tomaarsen Dec 12, 2024
a2051d6
Add initial eager attention support (also not equivalent to FA2 yet!)
tomaarsen Dec 12, 2024
124f1fd
Add initial tests, output_attentions, output_hidden_states, prune_heads
tomaarsen Dec 13, 2024
38f959b
Remove kwargs from ModernBertForMaskedLM
tomaarsen Dec 13, 2024
f716943
Remove/adjust/skip improper tests; warn if padding but no attn mask
tomaarsen Dec 13, 2024
f41adaa
Run formatting etc.
tomaarsen Dec 13, 2024
d06654a
Run python utils/custom_init_isort.py
tomaarsen Dec 14, 2024
f9301f4
FlexAttention with unpadded sequences(matches FA2 within bf16 numerics)
staghado Dec 15, 2024
a356708
Reformat init_weights based on review
tomaarsen Dec 16, 2024
f83fdc0
self -> module in attention forwards
tomaarsen Dec 16, 2024
b444c15
Remove if config.tie_word_embeddings
tomaarsen Dec 16, 2024
5aaf273
Reformat output projection on a different line
tomaarsen Dec 16, 2024
0a8d044
Remove pruning
tomaarsen Dec 16, 2024
382e481
Remove assert
tomaarsen Dec 16, 2024
5d05e8e
Call contiguous() to simplify paths
tomaarsen Dec 16, 2024
98508c7
Remove prune_qkv_linear_layer
tomaarsen Dec 16, 2024
2c076c8
Format code
tomaarsen Dec 16, 2024
986c6fe
Keep as kwargs, only use if needed
tomaarsen Dec 16, 2024
5cd39ad
Remove unused codepaths & related config options
tomaarsen Dec 16, 2024
2d606b9
Remove 3d attn_mask test; fix token classification tuple output
tomaarsen Dec 16, 2024
8eb87e8
Reorder: attention_mask above position_ids, fixes gradient checkpointing
tomaarsen Dec 16, 2024
5d83c56
Merge branch 'main' into pr-35158
tomaarsen Dec 16, 2024
3a24af4
Fix usage if no FA2 or torch v2.5+
tomaarsen Dec 16, 2024
37a6030
Make torch.compile/triton optional
tomaarsen Dec 17, 2024
b3b4028
Separate pooling options into separate functions (cls, mean) - cls as…
tomaarsen Dec 17, 2024
b241a7e
Simplify _pad_modernbert_output, remove unused labels path
tomaarsen Dec 17, 2024
66f4603
Update tied weights to remove decoder.weight, simplify decoder loading
tomaarsen Dec 17, 2024
3eb786b
Adaptively set config.compile based on hf_device_map/device/resize, etc.
tomaarsen Dec 17, 2024
093b601
Merge branch 'main' of https://github.com/huggingface/transformers in…
tomaarsen Dec 17, 2024
28fc79e
Update ModernBertConfig docstring
tomaarsen Dec 17, 2024
612befa
Satisfy some consistency checks, add unfinished docs
tomaarsen Dec 17, 2024
ae32e8b
Merge branch 'main' of https://github.com/huggingface/transformers in…
tomaarsen Dec 17, 2024
f4e280a
Only set compile to False if there's more than 1 device
tomaarsen Dec 17, 2024
bc14967
Add docstrings for public ModernBert classes
tomaarsen Dec 17, 2024
0f17fb9
Dont replace docstring returns - ends up being duplicate
tomaarsen Dec 17, 2024
25b12b4
Fix mistake in toctree
tomaarsen Dec 17, 2024
f312eef
Reformat toctree
tomaarsen Dec 17, 2024
1e367df
Patched FlexAttention, SDPA, Eager with Local Attention
tomaarsen Dec 17, 2024
fb748ce
Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial
tomaarsen Dec 17, 2024
051233f
Patch test edge case with Idefics3 not working with 'attn_implementat…
tomaarsen Dec 17, 2024
6c01711
Repad all_hidden_states as well
tomaarsen Dec 17, 2024
5f7c566
rename config.compile to reference_compile
warner-benjamin Dec 18, 2024
c8a80e7
disable flex_attention since it crashes
warner-benjamin Dec 18, 2024
8962f05
Update modernbert.md
bclavie Dec 18, 2024
7e89f4d
Using dtype min to mask in eager
NohTow Dec 18, 2024
0742a1d
Fully remove flex attention for now
tomaarsen Dec 18, 2024
6c6cddb
Call contiguous to allow for .view()
tomaarsen Dec 18, 2024
e37e4ec
Copyright 2020 -> 2024
tomaarsen Dec 18, 2024
9afc480
Update/simplify __init__ structure
tomaarsen Dec 18, 2024
aa1bdb4
Remove "... if dropout_prob > 0 else identity"
tomaarsen Dec 18, 2024
659807f
re-use existing pad/unpad functions instead of creating new ones
staghado Dec 18, 2024
7955e39
remove flexattention method
staghado Dec 18, 2024
4145119
Compute attention_mask and local_attention_mask once in modeling
tomaarsen Dec 18, 2024
0e572d5
Simplify sequence classification prediction heads, only CLS now
tomaarsen Dec 18, 2024
e5dca63
Simplify module.training in eager attn
tomaarsen Dec 18, 2024
bf11173
Also export ModernBertPreTrainedModel
tomaarsen Dec 18, 2024
54ed5db
Update the documentation with links to finetuning scripts
tomaarsen Dec 18, 2024
a1bfae8
Explain local_attention_mask parameter in docstring
tomaarsen Dec 18, 2024
df7658a
Simplify _autoset_attn_implementation, rely on super()
tomaarsen Dec 18, 2024
b3404ed
Keep "in" to initialize Prediction head
tomaarsen Dec 18, 2024
e057bc2
add back mean pooling
warner-benjamin Dec 18, 2024
99c38ba
Use the pooling head in TokenClassification
warner-benjamin Dec 18, 2024
5114ed7
update copyright
warner-benjamin Dec 18, 2024
175fb95
Reset config._attn_implementation_internal on failure
tomaarsen Dec 18, 2024
8cedfc5
Allow optional attention_mask in ForMaskedLM head
warner-benjamin Dec 18, 2024
2380729
fix failing run_slow tests
warner-benjamin Dec 18, 2024
7686134
Add links to the paper
tomaarsen Dec 19, 2024
44275fd
Remove unpad_no_grad, always pad/unpad without gradients
tomaarsen Dec 19, 2024
d799d65
local_attention_mask -> sliding_window_mask
tomaarsen Dec 19, 2024
ed77867
Revert "Use the pooling head in TokenClassification"
tomaarsen Dec 19, 2024
92e17c6
Simplify pooling, 2 options via if-else
tomaarsen Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,8 @@
title: mLUKE
- local: model_doc/mobilebert
title: MobileBERT
- local: model_doc/modernbert
title: ModernBert
- local: model_doc/mpnet
title: MPNet
- local: model_doc/mpt
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ Flax), PyTorch, and/or TensorFlow.
| [MobileNetV2](model_doc/mobilenet_v2) | ✅ | ❌ | ❌ |
| [MobileViT](model_doc/mobilevit) | ✅ | ✅ | ❌ |
| [MobileViTV2](model_doc/mobilevitv2) | ✅ | ❌ | ❌ |
| [ModernBERT](model_doc/modernbert) | ✅ | ❌ | ❌ |
| [Moshi](model_doc/moshi) | ✅ | ❌ | ❌ |
| [MPNet](model_doc/mpnet) | ✅ | ✅ | ❌ |
| [MPT](model_doc/mpt) | ✅ | ❌ | ❌ |
Expand Down
96 changes: 96 additions & 0 deletions docs/source/en/model_doc/modernbert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
tomaarsen marked this conversation as resolved.
Show resolved Hide resolved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# ModernBert

<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=modernbert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-modernbert-blueviolet">
</a>
<!-- <a href="">
<img alt="Paper page" src="https://img.shields.io/badge/Paper%20page--green">
</a> -->
</div>

## Overview

The ModernBert model was proposed in [Smarter, Better, Faster, Longer}: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference](#) by Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Galalgher, Raja Bisas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Grifin Adams, Jeremy Howard and Iacopo Poli.

It is a refresh of the traditional encoder architecture, as used in previous models such as [BERT](https://huggingface.co/docs/transformers/en/model_doc/bert) and [RoBERTa](https://huggingface.co/docs/transformers/en/model_doc/roberta).

It builds on BERT and implements many modern architectural improvements which have been developed since its original release, such as:
- [Rotary Positional Embeddings](https://huggingface.co/blog/designing-positional-encoding) to support sequences of up to 8192 tokens.
- [Unpadding](https://arxiv.org/abs/2208.08124) to ensure no compute is wasted on padding tokens, speeding up processing time for batches with mixed-length sequences.
- [GeGLU](https://arxiv.org/abs/2002.05202) Replacing the original MLP layers with GeGLU layers, shown to improve performance.
- [Alternating Attention](https://arxiv.org/abs/2004.05150v2) where most attention layers employ a sliding window of 128 tokens, with Global Attention only used every 3 layers.
- [Flash Attention](https://github.com/Dao-AILab/flash-attention) to speed up processing.
- A model designed following recent [The Case for Co-Designing Model Architectures with Hardware](https://arxiv.org/abs/2401.14489), ensuring maximum efficiency across inference GPUs.
- Modern training data scales (2 trillion tokens) and mixtures (including code ande math data)

The abstract from the paper is the following:

*Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.*

The original code can be found [here](https://github.com/answerdotai/modernbert).

## Usage tips

- This implementation is similar to [`BertModel`] ...
- ModernBert doesn't have `token_type_ids`, so you don't need to indicate which token belongs to which segment.
- ModernBert is similar to BERT but with ...

## Resources

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RoBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

<PipelineTag pipeline="sentence-similarity"/>

...

<PipelineTag pipeline="fill-mask"/>

- [Masked language modeling task guide](../tasks/masked_language_modeling)


## ModernBertConfig

[[autodoc]] ModernBertConfig

<frameworkcontent>
<pt>

## ModernBertModel

[[autodoc]] ModernBertModel
- forward

## ModernBertForMaskedLM

[[autodoc]] ModernBertForMaskedLM
- forward

## ModernBertForSequenceClassification

[[autodoc]] ModernBertForSequenceClassification
- forward

## ModernBertForTokenClassification

[[autodoc]] ModernBertForTokenClassification
- forward

</pt>
</frameworkcontent>
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [MBart](https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartModel)
* [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral#transformers.MistralModel)
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
* [ModernBert](https://huggingface.co/docs/transformers/model_doc/modernbert#transformers.ModernBert)
* [Moshi](https://huggingface.co/docs/transformers/model_doc/moshi#transformers.MoshiModel)
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
Expand Down Expand Up @@ -263,6 +264,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral#transformers.MistralModel)
* [Mllama](https://huggingface.co/docs/transformers/model_doc/mllama#transformers.MllamaForConditionalGeneration)
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
* [ModernBert](https://huggingface.co/docs/transformers/model_doc/modernbert#transformers.ModernBert)
* [Moshi](https://huggingface.co/docs/transformers/model_doc/moshi#transformers.MoshiModel)
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
Expand Down
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,7 @@
"models.mobilenet_v2": ["MobileNetV2Config"],
"models.mobilevit": ["MobileViTConfig"],
"models.mobilevitv2": ["MobileViTV2Config"],
"models.modernbert": ["ModernBertConfig"],
"models.moshi": [
"MoshiConfig",
"MoshiDepthConfig",
Expand Down Expand Up @@ -2861,6 +2862,15 @@
"MobileViTV2PreTrainedModel",
]
)
_import_structure["models.modernbert"].extend(
[
"ModernBertForMaskedLM",
"ModernBertForSequenceClassification",
"ModernBertForTokenClassification",
"ModernBertModel",
"ModernBertPreTrainedModel",
]
)
_import_structure["models.moshi"].extend(
[
"MoshiForCausalLM",
Expand Down Expand Up @@ -5556,6 +5566,7 @@
from .models.mobilevitv2 import (
MobileViTV2Config,
)
from .models.modernbert import ModernBertConfig
from .models.moshi import (
MoshiConfig,
MoshiDepthConfig,
Expand Down Expand Up @@ -7546,6 +7557,13 @@
MobileViTV2Model,
MobileViTV2PreTrainedModel,
)
from .models.modernbert import (
ModernBertForMaskedLM,
ModernBertForSequenceClassification,
ModernBertForTokenClassification,
ModernBertModel,
ModernBertPreTrainedModel,
)
from .models.moshi import (
MoshiForCausalLM,
MoshiForConditionalGeneration,
Expand Down
17 changes: 17 additions & 0 deletions src/transformers/loss/loss_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,22 @@ def ForCausalLMLoss(
return loss


def ForMaskedLMLoss(
logits, labels, vocab_size: int, num_items_in_batch: int = None, ignore_index: int = -100, **kwargs
):
# Upcast to float if we need to compute the loss to avoid potential precision issues
logits = logits.float()

# Flatten the tokens
logits = logits.view(-1, vocab_size)
labels = labels.view(-1)
# Enable model parallelism

labels = labels.to(logits.device)
loss = fixed_cross_entropy(logits, labels, num_items_in_batch, ignore_index, **kwargs)
return loss


def ForSequenceClassificationLoss(labels, pooled_logits, config, **kwargs):
num_labels = config.num_labels
if config.problem_type is None:
Expand Down Expand Up @@ -101,6 +117,7 @@ def ForTokenClassification(logits, labels, config, **kwargs):

LOSS_MAPPING = {
"ForCausalLM": ForCausalLMLoss,
"ForMaskedLM": ForMaskedLMLoss,
"ForQuestionAnswering": ForQuestionAnsweringLoss,
"ForSequenceClassification": ForSequenceClassificationLoss,
"ForTokenClassification": ForTokenClassification,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@
mobilenet_v2,
mobilevit,
mobilevitv2,
modernbert,
moshi,
mpnet,
mpt,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@
("mobilenet_v2", "MobileNetV2Config"),
("mobilevit", "MobileViTConfig"),
("mobilevitv2", "MobileViTV2Config"),
("modernbert", "ModernBertConfig"),
("moshi", "MoshiConfig"),
("mpnet", "MPNetConfig"),
("mpt", "MptConfig"),
Expand Down Expand Up @@ -508,6 +509,7 @@
("mobilenet_v2", "MobileNetV2"),
("mobilevit", "MobileViT"),
("mobilevitv2", "MobileViTV2"),
("modernbert", "ModernBERT"),
("moshi", "Moshi"),
("mpnet", "MPNet"),
("mpt", "MPT"),
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@
("mobilenet_v2", "MobileNetV2Model"),
("mobilevit", "MobileViTModel"),
("mobilevitv2", "MobileViTV2Model"),
("modernbert", "ModernBertModel"),
("moshi", "MoshiModel"),
("mpnet", "MPNetModel"),
("mpt", "MptModel"),
Expand Down Expand Up @@ -836,6 +837,7 @@
("mega", "MegaForMaskedLM"),
("megatron-bert", "MegatronBertForMaskedLM"),
("mobilebert", "MobileBertForMaskedLM"),
("modernbert", "ModernBertForMaskedLM"),
("mpnet", "MPNetForMaskedLM"),
("mra", "MraForMaskedLM"),
("mvp", "MvpForConditionalGeneration"),
Expand Down Expand Up @@ -990,6 +992,7 @@
("mistral", "MistralForSequenceClassification"),
("mixtral", "MixtralForSequenceClassification"),
("mobilebert", "MobileBertForSequenceClassification"),
("modernbert", "ModernBertForSequenceClassification"),
("mpnet", "MPNetForSequenceClassification"),
("mpt", "MptForSequenceClassification"),
("mra", "MraForSequenceClassification"),
Expand Down Expand Up @@ -1176,6 +1179,7 @@
("mistral", "MistralForTokenClassification"),
("mixtral", "MixtralForTokenClassification"),
("mobilebert", "MobileBertForTokenClassification"),
("modernbert", "ModernBertForTokenClassification"),
("mpnet", "MPNetForTokenClassification"),
("mpt", "MptForTokenClassification"),
("mra", "MraForTokenClassification"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,7 @@
("mllama", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("mluke", ("MLukeTokenizer" if is_sentencepiece_available() else None, None)),
("mobilebert", ("MobileBertTokenizer", "MobileBertTokenizerFast" if is_tokenizers_available() else None)),
("modernbert", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("moshi", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("mpnet", ("MPNetTokenizer", "MPNetTokenizerFast" if is_tokenizers_available() else None)),
("mpt", (None, "GPTNeoXTokenizerFast" if is_tokenizers_available() else None)),
Expand Down
61 changes: 61 additions & 0 deletions src/transformers/models/modernbert/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_modernbert": ["ModernBertConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_modernbert"] = [
"ModernBertForMaskedLM",
"ModernBertModel",
"ModernBertPreTrainedModel",
"ModernBertForSequenceClassification",
"ModernBertForTokenClassification",
]

if TYPE_CHECKING:
from .configuration_modernbert import ModernBertConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_modernbert import (
ModernBertForMaskedLM,
ModernBertForSequenceClassification,
ModernBertForTokenClassification,
ModernBertModel,
ModernBertPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
tomaarsen marked this conversation as resolved.
Show resolved Hide resolved
Loading