Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster cpu ops #1434

Merged
merged 13 commits into from
Sep 26, 2024
Merged

Faster cpu ops #1434

merged 13 commits into from
Sep 26, 2024

Conversation

awni
Copy link
Member

@awni awni commented Sep 24, 2024

  • Use a contiguous iterator for faster copy, unary, binary, ternary
  • Some fixes for large arrays (use size_t in most places now)
  • Remove some dim specializations to reduce bloat and binary size
  • Add vector specialization for ternary
  • Apply dimension collapsing for all the above ops

Benchmarks

Everything is way faster up to arbitrary dimensions. A few benchmarks:

Bench Pre Post
Copy 7D 5.811 5.867
Copy 8D 11.734 11.017
Binary 273.774 5.9
Unary 16.071 0.850

Note the copy is about the same because it had a 7D specialization. Now it only has up to 3D specialized.

mx.set_default_device(mx.cpu)

def copy(x):
    return x[..., ::-1]

def binary(x, y):
    return x + y

def unary(x):
    return mx.negative(x)

D = 10
x = mx.zeros((D, D, D, D, D, D, D))
x = mx.transpose(x, (1, 0, 4, 2, 3, 6, 5))
timeit(copy, x)

y = mx.zeros((D, D, D, D, D, D, D))
timeit(binary, x, y)

z = mx.random.uniform(shape=(12,)*6)[..., ::2]
mx.eval(unary(z))
timeit(unary, z)

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took some reading cause this moves some code that was added to fix an oob access in the copy. I am pretty sure it just moved further down in copy so no regression there let me know if I missed something.

Otherwise it looks great. Love to see all those red lines in the diff.

@awni
Copy link
Member Author

awni commented Sep 26, 2024

It took some reading cause this moves some code that was added to fix an oob access in the copy.

Which code are you referring to? I can double check that I didn't mess that up.

0,
0,
CopyType::General);
copy_inplace(in, out, CopyType::General);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the diff I was referring to. Git commit 03cf033 .

dst,
data_shape,
i_strides,
make_contiguous_strides<StrideT>(data_shape),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it should be fine due to that.

@awni awni merged commit 5b6f38d into main Sep 26, 2024
4 checks passed
@awni awni deleted the faster_cpu_ops branch September 26, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants