Some minor optimizations #483

pacak · 2024-03-12T18:34:23Z

Add a benchmark (that works on stable toolchain) to establish the baseline
Disable timing by default, enable when needed
Disable execution counting by default

after:

newton  fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ run  4.9 µs        │ 171.4 µs      │ 5.25 µs       │ 6.218 µs      │ 4723907 │ 4723907

before:

    newton  fastest       │ slowest       │ median        │ mean          │ samples │ iters
    ╰─ run  6.317 µs      │ 171.8 µs      │ 6.729 µs      │ 7.75 µs       │ 3807698 │ 3807698

Fixes #482

pacak · 2024-03-12T18:39:22Z

Commit that adds benchmark is not necessary, I can drop it or change to use criterion.

pacak · 2024-03-12T20:17:04Z

~~I'll add some docs in a second, but I'm not sure what the failures are about.~~

Should be working now.

stefan-k

Thank a lot for your work! I have a few comments. I understand why you made set_counting part of the State trait, but I'd rather not expose this functionality that way. Is it possible to make it part of the interface of the various states, but not the State trait?
This essentially highlights some of the issues I have with the current design of state handling (#478). I will have to think about it, but that may take a bit.

Btw, have you tried whether making line 224 in executor.rs optional helps? Unfortunately I can't make a review comment there because the PR doesn't touch this code. I mean this line:

state.func_counts(&self.problem);

I understand that it does not remove all counts handling, but I'm just interested how that affects performance.

Finally I would prefer to not have the benchmark as part of this PR, but you can of course keep it in the PR until it is ready to be merged.

crates/argmin/src/core/executor.rs

crates/argmin/src/core/state/mod.rs

stefan-k · 2024-03-13T07:29:34Z

crates/argmin/src/core/state/iterstate.rs

-            *count = v
+        if self.counting_enabled {
+            for (k, &v) in problem.counts.iter() {
+                let count = self.counts.entry(k.to_string()).or_insert(0);


By the way, have you been able to find out whether hashing is the problem or the allocation of .to_string()? In case it was the latter, I was wondering if Cow would help?
If it's hashing, I was wondering if replacing HashMap<String, u64> with HashMap<SomeEnum, u64> would help, where

enum SomeEnum { CostCount, GradientCount, OperatorCount, JacobianCount, HessianCount, AnnealCount, Other(String), }

(or something along those lines). That would cover all the function calls that in argmin, but also allows one to count function calls defined in external code via Other. It also avoids .to_string().

As far as I remember it's 35% allocation and 65% hashing. HashMap is DoS resistant by default. Changing the hash function or switching to BTreeMap combined with enum approach you propose might help a bit I guess, but this changes public API.

I don't mind changing the public API in that case. Why do you propose a BTreeMap instead of a HashMap? Hashing an enum should be quick (I expect it to use it's integer representation as a hash with a bit of additional overhead for hashing Other(String)). Anyway I think this goes beyond this PR and can certainly be left as a future improvement.

pacak · 2024-03-13T13:25:32Z

Is it possible to make it part of the interface of the various states, but not the State trait?

Should be possible for as long as I don't add "enabled if observers are present". Since you suggest to remove that logic - I can move it to part of the each interface. Will try that today.

Btw, have you tried whether making line 224 in executor.rs optional helps?
I understand that it does not remove all counts handling, but I'm just interested how that affects performance.

This removes about half of the overhead related to counting I think. Better than nothing

Finally I would prefer to not have the benchmark as part of this PR, but you can of course keep it in the PR until it is ready to be merged.

Absolutely, I just wanted to show what exactly I'm measuring. Will drop before merging.

pacak · 2024-03-13T20:20:39Z

made a change to comments
renamed set_counting to counting and moved it to corresponding states instead of a trait
removed logic that enables counting when observer is present

stefan-k

Apologies for the very late reply, unfortunately I wasn't able to get to it earlier.

crates/argmin/src/core/executor.rs

stefan-k · 2024-03-26T06:44:40Z

crates/argmin/src/core/state/iterstate.rs

-            *count = v
+        if self.counting_enabled {
+            for (k, &v) in problem.counts.iter() {
+                let count = self.counts.entry(k.to_string()).or_insert(0);


I don't mind changing the public API in that case. Why do you propose a BTreeMap instead of a HashMap? Hashing an enum should be quick (I expect it to use it's integer representation as a hash with a bit of additional overhead for hashing Other(String)). Anyway I think this goes beyond this PR and can certainly be left as a future improvement.

pacak · 2024-03-26T13:31:21Z

Hashing an enum should be quick (I expect it to use it's integer representation as a hash with a bit of additional overhead for hashing Other(String)).

By default HashMap uses SipHash 1-3 with random seed which is slow. What's more underlying SwissTable hash doesn't perform well when values are not random. Either way - I don't mind which implementation is used as long as it can be disabled. To minimize the overhead of running with it enabled if changing the API is not a problem I'd look into enum_map crate. SomeEnum you proposed above can then be parametrized with some type T for custom measurements by external users. enum_map maps values into an array so should be pretty fast. But yea, this is outside of the scope of this pull request.

pacak · 2024-03-26T13:32:55Z

New changes:

I removed the benchmark
Adding an observer will not enable timing

pacak · 2024-03-29T13:14:14Z

Fixed the failing test. FWIW I'm getting test failures around the places I never touched...

This one is related to floating point accuracy - probably EPSILON is too large..

$ cargo test -p argmin --all-features

---- solver::gaussnewton::gaussnewton_linesearch::tests::test_next_iter_regression stdout ----
thread 'solver::gaussnewton::gaussnewton_linesearch::tests::test_next_iter_regression' panicked at crates/argmin/src/solver/gaussnewton/gaussnewton_linesearch.rs:445:9:
assert_relative_eq!(state.param.as_ref().unwrap()[1], 2.25f64, epsilon = f64::EPSILON)

    left  = 2.2499999999999964
    right = 2.25


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    solver::gaussnewton::gaussnewton_linesearch::tests::test_next_iter_regression

Not sure what is this about at all

$ cargo test --all-features
   Compiling argmin-math v0.4.0 (/home/pacak/ej/argmin/crates/argmin-math)
error[E0277]: the trait bound `num_complex::Complex<f32>: ndarray_linalg::Lapack` is not satisfied
  --> crates/argmin-math/src/ndarray_m/inv.rs:18:13
   |
18 |             Array2<$t>: Inverse,
   |             ^^^^^^^^^^^^^^^^^^^ the trait `ndarray_linalg::Lapack` is not implemented for `num_complex::Complex<f32>`
...
38 | make_inv!(Complex<f32>);
   | ----------------------- in this macro invocation
   |
   = help: the following other types implement trait `ndarray_linalg::Lapack`:
             f32
             f64
             nalgebra::Complex<f32>
             nalgebra::Complex<f64>
   = note: required for `ArrayBase<OwnedRepr<num_complex::Complex<f32>>, ndarray::Dim<[usize; 2]>>` to implement `Inverse`
   = help: see issue #48214
   = note: this error originates in the macro `make_inv` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0277]: the trait bound `num_complex::Complex<f64>: ndarray_linalg::Lapack` is not satisfied
  --> crates/argmin-math/src/ndarray_m/inv.rs:18:13
   |
18 |             Array2<$t>: Inverse,
   |             ^^^^^^^^^^^^^^^^^^^ the trait `ndarray_linalg::Lapack` is not implemented for `num_complex::Complex<f64>`
...
39 | make_inv!(Complex<f64>);
   | ----------------------- in this macro invocation
   |
   = help: the following other types implement trait `ndarray_linalg::Lapack`:
             f32
             f64
             nalgebra::Complex<f32>
             nalgebra::Complex<f64>
   = note: required for `ArrayBase<OwnedRepr<num_complex::Complex<f64>>, ndarray::Dim<[usize; 2]>>` to implement `Inverse`
   = help: see issue #48214
   = note: this error originates in the macro `make_inv` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0277`.
error: could not compile `argmin-math` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...
error: could not compile `argmin-math` (lib test) due to 2 previous errors

stefan-k · 2024-03-29T13:43:46Z

Fixed the failing test. FWIW I'm getting test failures around the places I never touched...

I suspect you need to rebase onto main. There was a Rust version update and that typically causes new clippy lints to fail, but that was fixed in #488 .

This one is related to floating point accuracy - probably EPSILON is too large..
$ cargo test -p argmin --all-features
...

We've had this before -- it typically only fails locally, not in the CI. We couldn't figure out why ... if it does fail in the CI too I'm happy to increase the epsilon, but judging from the current state of the CI run, it seems to pass.

Not sure what is this about at all

That's really strange... but it seems to pass in the CI.

I hope to be able to do another review today! :)

codecov-commenter · 2024-03-29T13:44:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.09%. Comparing base (d5e1f3c) to head (79a274a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #483      +/-   ##
==========================================
- Coverage   92.11%   92.09%   -0.03%     
==========================================
  Files         178      178              
  Lines       24398    24455      +57     
==========================================
+ Hits        22475    22521      +46     
- Misses       1923     1934      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pacak · 2024-03-29T13:47:18Z

I suspect you need to rebase onto main.

Done. In fact I have scripts to automagically rebase onto origin/master, then someone smart decided to rename the default branch.

stefan-k

LGTM, thanks a lot!

pacak force-pushed the main branch 2 times, most recently from a642ad4 to 9821c93 Compare March 12, 2024 20:50

stefan-k requested changes Mar 13, 2024

View reviewed changes

pacak force-pushed the main branch from 9821c93 to 16df1c5 Compare March 13, 2024 19:55

stefan-k requested changes Mar 26, 2024

View reviewed changes

pacak force-pushed the main branch from 16df1c5 to 40464a5 Compare March 26, 2024 13:14

pacak force-pushed the main branch from 40464a5 to aaf4def Compare March 29, 2024 13:10

pacak added 2 commits March 29, 2024 09:44

Disable timing by default, enable when needed

306297e

Disable counting by default

79a274a

pacak force-pushed the main branch from aaf4def to 79a274a Compare March 29, 2024 13:45

stefan-k approved these changes Mar 30, 2024

View reviewed changes

stefan-k merged commit a9e3eb7 into argmin-rs:main Mar 30, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor optimizations #483

Some minor optimizations #483

pacak commented Mar 12, 2024 •

edited

Loading

pacak commented Mar 12, 2024

pacak commented Mar 12, 2024 •

edited

Loading

stefan-k left a comment

stefan-k Mar 13, 2024

pacak Mar 13, 2024

stefan-k Mar 26, 2024

pacak commented Mar 13, 2024

pacak commented Mar 13, 2024

stefan-k left a comment

stefan-k Mar 26, 2024

pacak commented Mar 26, 2024

pacak commented Mar 26, 2024

pacak commented Mar 29, 2024

stefan-k commented Mar 29, 2024

codecov-commenter commented Mar 29, 2024 •

edited

Loading

pacak commented Mar 29, 2024

stefan-k left a comment

Some minor optimizations #483

Some minor optimizations #483

Conversation

pacak commented Mar 12, 2024 • edited Loading

pacak commented Mar 12, 2024

pacak commented Mar 12, 2024 • edited Loading

stefan-k left a comment

Choose a reason for hiding this comment

stefan-k Mar 13, 2024

Choose a reason for hiding this comment

pacak Mar 13, 2024

Choose a reason for hiding this comment

stefan-k Mar 26, 2024

Choose a reason for hiding this comment

pacak commented Mar 13, 2024

pacak commented Mar 13, 2024

stefan-k left a comment

Choose a reason for hiding this comment

stefan-k Mar 26, 2024

Choose a reason for hiding this comment

pacak commented Mar 26, 2024

pacak commented Mar 26, 2024

pacak commented Mar 29, 2024

stefan-k commented Mar 29, 2024

codecov-commenter commented Mar 29, 2024 • edited Loading

Codecov Report

pacak commented Mar 29, 2024

stefan-k left a comment

Choose a reason for hiding this comment

pacak commented Mar 12, 2024 •

edited

Loading

pacak commented Mar 12, 2024 •

edited

Loading

codecov-commenter commented Mar 29, 2024 •

edited

Loading