various changes to path machinery #61

jcmgray · 2018-10-01T00:50:50Z

Description

In the process of comparing/benchmarking the new 'cheap' path (#60) I thought various consistency/convenience updates to the path machinery might be worthwhile. Some of these are bigger than others so definitely let me know what you think.

Per Make no memory limit the default? #55, change the memory_limit default to unlimited. As I mentioned in the other thread, I think any time that the arrays are going to get too big for RAM, resorting to einsum is going to be so exponentially slow that a MemoryError might be preferable. We could add 'warn'/'system' versions that take system memory into account, the only thing is that this requires checking the output dtype and getting the systems free memory which is a bit of overhead. This change does make 'optimal' a bit slower, but at the same time, it possibly wasn't obvious previously that the path found might not have been globally optimal. Happy to split/delay this into a later PR if necessary (@dgasmith?).
Deprecated (with a warning) the path= keyword argument in contract_path in favour of optimize=. Other functions, including in numpy use optimize so I think this makes sense to have a matching signature between contract, contract_path, & contract_expression.
Factor the full path information into a class (PathInfo) and update it to allow printing flop costs >1e307 (which previously errored). This allows access to the various path costs programatically (e.g. path.opt_cost, path.largest_intermediate etc) and makes the default repr the full printed info so no need to print(path). This is a breaking change if anyone was using the path as a proper string (e.g. @fritzo I saw you were regexing this) but ultimately should negate the need to do that (and a cheap fix is just str(path)).
Add helper function oe.helpers.rand_equation. This is a just a private function that can be helpful to generate large random expressions with variable connectivity and is thus might be useful for testing.
Finally, the cupy tests weren't working - I think it needs to be imported explicitly, and before other GPU libs, in order to initialize CUDA properly, so I've done that.

TODO:

Note the memory_limit choice in the docs somewhere?
Test the rand_equation generator / update some tests to use it?

Status

Ready to go

fix cupy a bit

codecov-io · 2018-10-01T00:59:24Z

Codecov Report

Merging #61 into master will decrease coverage by 0.16%.
The diff coverage is 91.46%.

fritzo

👍 It will be great to have access to the PathInfo object!

fritzo · 2018-10-01T01:39:50Z

opt_einsum/contract.py

+            path_run = (self.scale_list[n], do_blas, einsum_str, remaining_str)
+            path_print += "\n{:>4} {:>14} {:>22} {:>37}".format(*path_run)
+
+        return path_print


nit: you could avoid quadratic growth by using lines.append(...) and finally return '\n'.join(lines)

fritzo · 2018-10-01T01:42:24Z

opt_einsum/contract.py

@@ -15,6 +15,68 @@
 __all__ = ["contract_path", "contract", "format_const_einsum_str", "ContractExpression", "shape_only", "shape_only"]


+class PathInfo:


nit: Python 2 convention is to always inherit from object:

class PathInfo(object): ...

jcmgray · 2018-10-01T11:06:43Z

@fritzo, great, I've made those couple of changes - thanks for pointing them out.

dgasmith

Overall, LGTM! Thanks for the changes.

dgasmith · 2018-10-01T13:07:00Z

opt_einsum/contract.py

+            "  Optimized FLOP count:  {:.3e}\n".format(opt_cost),
+            "   Theoretical speedup:  {:3.3f}\n".format(speedup),
+            "  Largest intermediate:  {:.3e} elements\n".format(largest_intermediate),
+            "-" * 80 + "\n",


+1 for format. If you thinking about it can we replace the old % syntax elsewhere through the code so we can be consistent? Ancillary point, we can spin this off in its own issue.

dgasmith · 2018-10-01T13:07:43Z

opt_einsum/contract.py

-    path_type = kwargs.pop('path', 'auto')
+    if 'path' in kwargs:
+        import warnings
+        warnings.warn("The 'path' keyword argument is deprecated in favor of 'optimize'.", DeprecationWarning)


Agree, this is a good change overall.

dgasmith · 2018-10-01T13:10:45Z

opt_einsum/contract.py

-        By default (None) will size the ``memory_limit`` as the largest input tensor.
-        Users can also specify ``-1`` to allow arbitrarily large tensors to be built.
+
+        - if None or -1, there is no limit.


Can we change the text style here to follow more like:

- ‘None’ or -1 means there is no limit - ‘max_input’ means the limit is set as the size of the largest input tensor ... The default is `None`.

dgasmith · 2018-10-01T13:11:31Z

opt_einsum/helpers.py

+
+    Examples
+    --------
+    >>> eq, shapes = rand_equation(n=10, reg=4, n_outer=5, seed=42)


Should be rand_equation(10, 4, 5, seed=42) I think?

This works fine for me leaving them all as keyword arguments?

dgasmith · 2018-10-01T13:12:08Z

opt_einsum/helpers.py

+    reg : int
+        Average connectivity of graph.
+    n_outer : int
+        Number of outer indices.


Can you expend on outer indices?

dgasmith · 2018-10-01T13:12:47Z

opt_einsum/helpers.py

+    n : int
+        Number of array arguments.
+    reg : int
+        Average connectivity of graph.


Not sure what reg is meant to symbolize here, perhaps write this word out?

dgasmith · 2018-10-01T13:14:00Z

opt_einsum/helpers.py

@@ -167,3 +168,95 @@ def flop_count(idx_contraction, inner, num_terms, size_dictionary):
        op_factor += 1

    return overall_size * op_factor
+
+
+def rand_equation(n, reg, n_outer, dmin=2, dmax=9, seed=None):


There might be a slight mismatch between styles here dmin nouter vs d_min, n_outer. I think your current nomenclature is the lesser of the evils, but worth thinking about.

dgasmith · 2018-10-01T13:20:28Z

It might be good to write a few tests that use the random path tech with a static seed and compares that the results are the same across all paths.

dgasmith · 2018-10-01T20:25:39Z

opt_einsum/tests/test_contract.py

+@pytest.mark.parametrize("reg", [3, 4])
+@pytest.mark.parametrize("n_out", [0, 2, 4])
+def test_rand_equation(optimize, n, reg, n_out):
+    eq, shapes = helpers.rand_equation(n, reg, n_out, d_min=2, d_max=5)


Did we want to add a seed to prevent irreducible tests? Alternatively, we could leave the random tests in as long as we print the seed on failure so we can reproduce later.

Thanks for spotting I missed this - fixed now.

dgasmith · 2018-10-01T20:26:18Z

As mentioned elsewhere I think we should try to get this in before #60.

dgasmith

I think this looks good to go once the tests complete. Let me know if there are any additional holdups.

jcmgray · 2018-10-02T12:09:12Z

Great I'll merge.

path finding changes

8fec17a

fix cupy a bit

fritzo reviewed Oct 1, 2018

View reviewed changes

PathInfo: inherit from object and use join for repr

f8bdece

dgasmith reviewed Oct 1, 2018

View reviewed changes

fritzo mentioned this pull request Oct 1, 2018

Add prototype cheap optimizer #60

Merged

5 tasks

jcmgray added 3 commits October 1, 2018 19:05

update docstrings and var names

9d540b4

update string formatting

ddd47b7

test rand_equation

809a3b1

dgasmith reviewed Oct 1, 2018

View reviewed changes

set seed for rand_equation test

f297d5f

dgasmith approved these changes Oct 1, 2018

View reviewed changes

jcmgray merged commit ccbdf6b into dgasmith:master Oct 2, 2018

dgasmith added this to the v2.3 milestone Oct 3, 2018

dgasmith added the enhancement label Oct 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various changes to path machinery #61

various changes to path machinery #61

jcmgray commented Oct 1, 2018 •

edited

Loading

codecov-io commented Oct 1, 2018 •

edited

Loading

fritzo left a comment

fritzo Oct 1, 2018

fritzo Oct 1, 2018

jcmgray commented Oct 1, 2018

dgasmith left a comment

dgasmith Oct 1, 2018

dgasmith Oct 1, 2018

dgasmith Oct 1, 2018

dgasmith Oct 1, 2018

jcmgray Oct 1, 2018

dgasmith Oct 1, 2018

dgasmith Oct 1, 2018

dgasmith Oct 1, 2018

dgasmith commented Oct 1, 2018

dgasmith Oct 1, 2018

jcmgray Oct 1, 2018

dgasmith commented Oct 1, 2018

dgasmith left a comment •

edited

Loading

jcmgray commented Oct 2, 2018

		@@ -15,6 +15,68 @@
		__all__ = ["contract_path", "contract", "format_const_einsum_str", "ContractExpression", "shape_only", "shape_only"]


		class PathInfo:

various changes to path machinery #61

various changes to path machinery #61

Conversation

jcmgray commented Oct 1, 2018 • edited Loading

Description

TODO:

Status

codecov-io commented Oct 1, 2018 • edited Loading

Codecov Report

fritzo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcmgray commented Oct 1, 2018

dgasmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgasmith commented Oct 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgasmith commented Oct 1, 2018

dgasmith left a comment • edited Loading

Choose a reason for hiding this comment

jcmgray commented Oct 2, 2018

jcmgray commented Oct 1, 2018 •

edited

Loading

codecov-io commented Oct 1, 2018 •

edited

Loading

dgasmith left a comment •

edited

Loading