Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Layer positions labelling & layernorm #348

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Removed dupliacated biases in the FusedMLP layers [#317]
- Rotary embeddings respecting input types [#326]
- Poolformer style instantiating useless projection layers [#349]
- Fix layer position not being properly tracked, causing extra layernorms for programatic xformers [#348]

### Added
- Four blocksparsity layouts from DeepSpeed [#320]
Expand Down
5 changes: 1 addition & 4 deletions xformers/factory/block_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,7 @@ def __init__(self, config: xFormerEncoderConfig, **kwargs):
# Optional patch embedding
self.patch_emb: Optional[nn.Module] = None

if (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was not respecting the config, while trying to do the right thing: if the config was asking for a patch embedding, but the layer was not first, it would not be instantiated. In retrospect I think that it's risky, not doing what the API says it will do, plus it only worked in practice because is_first() was often wrong. I think now that it's better to respect the config no matter what, and not silently diverge

config.patch_embedding_config is not None
and config.layer_position.is_first()
):
if config.patch_embedding_config is not None:
self.patch_emb = build_patch_embedding(
PatchEmbeddingConfig(**config.patch_embedding_config)
)
Expand Down
6 changes: 4 additions & 2 deletions xformers/factory/model_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,12 @@ def __init__(
for i in range(config.num_layers):
# Label where this layer is in the stack
# (for instance useful for the positional encoding, or late layer norm)
if i > 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would count within the repeated layers, not the overall layer stack

if len(recipient) > 0:
config.layer_position.mark_not_first()
if i < config.num_layers - 1:

if config != stack_configs[-1] or i < config.num_layers - 1:
config.layer_position.mark_not_last()

block = builder(config) # type: ignore

# If reversible: extract the reversible sub-parts, else append the block as-is
Expand Down