Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Int4Tensor refactor to implements pattern #458

Closed
wants to merge 6 commits into from

Conversation

melvinebenezer
Copy link
Contributor

@melvinebenezer melvinebenezer commented Jun 28, 2024

Refactoring UInt4Tensor to have implements pattern similar to nf4tensor and UInt2Tensor

ToDo

  • Create implements for UInt4Tensor and PerChannelSymmetricWeight
  • Test Cases
  • Move uint4i to uint4.py

Copy link

pytorch-bot bot commented Jun 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/458

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024
@jerryzh168
Copy link
Contributor

jerryzh168 commented Jun 28, 2024

@melvinebenezer thanks for working on this, can you take a look at Developer API in #391, this has the structure of what new dtype tensor subclass can use, the current UInt4Tensor was created before we have this doc available, and now we can indeed work on improving that to align with our plan

return fn
return decorator

def _dynamically_quantize_per_channel_int4(x, quant_min, quant_max, target_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I feel this can probably be covered by AffineQuantizedTensor:

Affine quantized tensor subclass. Affine quantization means we quantize the floating point tensor with an affine transformation:
, and the previous dequantize_per_channel is calling our unified quant_primitive ops:
def dequantize_per_channel(int_repr, scales, zero_points, out_dtype=torch.float32):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on this

@melvinebenezer
Copy link
Contributor Author

Todo

  • Use AffineQuantizedTensor
  • Convert test cases to pytest to maintain uniformity

@jerryzh168
Copy link
Contributor

jerryzh168 commented Aug 30, 2024

@melvinebenezer we have landed Uintx tensor subclass to affine quantized tensor recently: https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/Uintx.py

we now have a UintxTensor here:

class UintxTensor(torch.Tensor):
so uint4 tensor can probably be deprecated for now, I think moving over the relevant operator implementation to UintxTensor might be helpful (add add some tests for the new ops)

@melvinebenezer
Copy link
Contributor Author

@jerryzh168 Sure, will make the changes

@melvinebenezer
Copy link
Contributor Author

@jerryzh168 shall I close this draft and work on new one ?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Sep 3, 2024

@jerryzh168 shall I close this draft and work on new one ?

yeah opening a new one makes more sense I think, you can add tests in https://github.com/pytorch/ao/blob/main/test/dtypes/test_uintx.py

I'm not sure if it's possible to migrate over all the implementations over, I'd suggest to start with a few tests like slicing tests etc.

e.g. copy paste the test:

uintx_weight_only(dtype)(l)

and do: sliced_weight = l.weight[1:2], although I'm not sure what are all the ops that may make sense or useful right now, so feel free to just add a few popular ops for now, I'm also planning to put up a test suite for tensor subclasses this week.

another addition that might be helpful is to support distributed inference for uintx tensors, I'm starting a PR here: #785 and will likely have something ready soon so that you can follow to extend UintxAQTLayout to support distributed inference

@melvinebenezer
Copy link
Contributor Author

@jerryzh168 Yes makes total sense. will continue the conversation in discord and the new PR

@melvinebenezer melvinebenezer mentioned this pull request Oct 7, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants