Reusing work in multiple iterations of a problem solved through column generation #1198

fuglede · 2023-03-09T08:36:53Z

fuglede
Mar 9, 2023

I have a large-ish LP that can be effectively solved through column generation (adding some 10,000 columns at a time over 10-20 iterations, with the subproblem itself being efficiently implemented to the point of HiGHS being the bottleneck at the moment), and I'm curious if there are any natural tricks that I could be applying to minimize the amount of additional work required by HiGHS in each iteration. I'm using the scipy.optimize.linprog interface as a wrapper for the dual simplex solver, and am passing constraints as a sequence of growing CSC matrices, but maybe I would be better off using HiGHS internals directly?

jajhall · 2023-03-21T11:19:33Z

jajhall
Mar 21, 2023
Maintainer Sponsor

Every time you call scipy.optimize.linprog, the current LP will be solved from scratch. However, when rows or columns are added to an LP (or other changes in model data are made) it is normal for the simplex implementation to solve the new LP much faster when starting from the optimal solution of the original LP.

In your case, so long as it is feasible for the variables associated with your new columns to take zero values, the simplex solver can start from a feasible solution to the new LP. This is very likely to be much faster than solving the new LP from scratch.

However, if it is very much faster to solve your LP using the barrier solver, then it's not possible to exploit the knowledge of the optimal solution of a related problem.

Since you are looking to do more than just solve a one-off LP, I strongly advise you to call HiGHS directly. From Python this can be done with highspy on PyPi.

Sorry for not replying sooner: I don't get notifications of discussion items

0 replies

fuglede · 2023-04-18T09:32:52Z

fuglede
Apr 18, 2023
Author

Thanks Julian, knowing that something like this should be possible is helpful by itself!

I tried doing a comparison in which I, on one hand, solve the LP over and over using SciPy, and one in which I use highspy to build up the problem iteratively. What I find is that indeed the latter approach ends up being faster overall, and is much faster when I only add a few columns, but on the other hand, for the bulk of the calculation, solving it from scratch is in fact much faster. I checked that the act of modifying the model itself does not make a large contribution to running time compared to solving it, so that's not it. There could be some differences in how the library is compiled (which compiler optimization flags are included etc.), but that seems unlikely to matter enough to explain the difference.

So, clearly I could get the best of both worlds if I figure out when to append to the existing solution rather than start from scratch, but I'm still surprised that it's not simply always faster to build up the solution; maybe I shouldn't be?

As a bit of an aside, I noticed that the Python docs suggest using addCols to add multiple columns at a time, but that method simply does not exist in the highspy interface.

Concretely, the total calculation time for the two approaches on my problem are as follows:

Columns	Objective	Time (start over)	Time (with reuse)
15780	1.85445e+08	0.282883	0.774072
22958	1.32033e+08	1.73944	4.72433
29820	8.59701e+07	5.55982	11.1114
36566	5.77604e+07	10.0037	21.0166
42851	4.47806e+07	15.7885	32.9154
48322	3.69155e+07	24.3535	46.6236
52504	2.93158e+07	33.9033	62.3933
55882	2.60685e+07	45.3816	74.4668
57972	2.41611e+07	58.4658	84.5348
59177	2.37143e+07	71.8295	91.5786
59666	2.36217e+07	86.8619	95.5584
59744	2.3615e+07	101.041	96.6296
59761	2.36149e+07	113.684	97.1893
59801	2.36149e+07	126.766	97.2071

Plotted:

To benchmark, I've solved the whole thing to get the final matrices, then simply truncate them afterwards to put both approaches on equal footing; that is, I've represented the problem through sparse arrays A_eq_base, b_eq, A_ub_base, b_ub, c_base (whose meanings match the SciPy interfaces), let ns denote the number of columns per iteration, and in the case of using SciPy to solve them from scratch, I do

for n in ns:
    A_eq = A_eq_base[:, :n]
    A_ub = A_ub_base[:, :n]
    c = c_base[:n]
    res = linprog(c=c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)

and for the highspy case, I do

inf: np.double = highspy.kHighsInf

# Combine inequality and equality constraints
lhs_ub = -np.ones_like(b_ub) * np.inf
rhs_ub = b_ub
lhs_eq = b_eq
rhs_eq = b_eq
lhs = np.concatenate((lhs_ub, lhs_eq))
rhs = np.concatenate((rhs_ub, rhs_eq))
A = vstack((A_ub_base, A_eq_base))
numrow = A.shape[0]
h = highspy.Highs()
h.setOptionValue("log_to_console", False)
h.addRows(numrow, lhs, rhs, 0, np.empty(0, dtype=np.double), np.empty(0, dtype=np.double), np.empty(0, dtype=np.double))

for n1, n2 in zip([0] + ns, ns):
    cols = A[:, n1:n2]
    # The docs suggest that h.addCols ought to work, but that's not part of the highspy interface
    for i in range(n1, n2):
        col = A[:, i]
        h.addCol(c_base[i], 0, inf, col.nnz, col.indices, col.data)
    h.run()

0 replies

jajhall · 2023-04-18T09:56:36Z

jajhall
Apr 18, 2023
Maintainer Sponsor

In SciPy and HiGHS, you're still using the simplex solver for everything, and the behaviour you're getting is largely to be expected

No advantage in hot starting (reuse) when adding lots of columns. That it's 2-3 times faster makes me think that it's using the primal simplex solver (since you remain primal feasible when adding columns) - which is slower than the dual simplex solver that's used when starting from scratch. You could test this out by calling h.setOptionValue("[simplex_strategy](https://ergo-code.github.io/HiGHS/dev/options/definitions/#simplex_strategy)", 1) before h.run()
Some advantage when adding relatively few columns

However, when solving from scratch it's likely to be very much faster using the interior point solver.

In SciPy call linprog(method=’interior-point’)
In HiGHS, call h.setOptionValue("solver", "ipm") before h.run()

0 replies

jajhall · 2023-04-18T10:00:25Z

jajhall
Apr 18, 2023
Maintainer Sponsor

As a bit of an aside, I noticed that the Python docs suggest using addCols to add multiple columns at a time, but that method simply does not exist in the highspy interface.

Yes, addCols was accidentally omitted from highspy in the most recent HiGHS release. However, it's been added to the (much enhanced) highspy in https://github.com/ERGO-Code/HiGHS/tree/latest, a branch that really ought to migrate to master and yield a new release soon!

0 replies

fuglede · 2023-04-18T10:38:05Z

fuglede
Apr 18, 2023
Author

Thanks again; much appreciated! I repeated the exercise with each of your suggestions and end up with the below results (redoing the two cases from before in the process), where it's implicit that "SciPy" means "starting from scratch", and "highspy" means "hot starting".

Notably, when coming from SciPy, using method='highs_ds' (the default), or method='highs-ipm' makes little difference. On the highspy side, setting simplex_strategy=1 did not make a difference (but setting simplex_strategy=4 makes it much slower, so I suppose it did indeed default to the dual simplex solver). I suppose the most surprising thing is that there's such a big difference when sticking to the dual simplex solvers.

SciPy (defaults)	SciPy (method=highs-ipm)	highspy (defaults)	highspy (simplex_strategy=1)	highspy (solver=ipm)
0.281443	0.619292	0.821927	0.903024	0.821327
1.28227	2.41936	5.06147	5.49867	2.64984
3.75573	5.52093	11.6082	12.2995	6.35688
8.44328	10.1234	21.8144	22.655	12.0332
14.741	15.2714	33.363	34.7875	18.4336
25.0573	22.0352	47.038	48.5287	24.8992
35.5561	30.1308	63.324	64.527	32.9319
47.8846	38.8826	75.9797	78.3656	41.7402
61.5887	48.9207	86.9164	90.0447	51.6711
75.6068	59.7044	94.7091	97.4941	61.7038
89.0585	69.4825	99.2287	101.799	72.2203
104.476	79.4619	100.384	102.943	81.8499
117.623	89.3272	100.98	103.566	92.017
131.051	99.2685	100.998	103.585	102.102

In particular, this means that if I use h.setOptionValue("solver", "ipm" if n2 < 59000 else "simplex"), I get a total running time of 66.5 seconds which is beautiful, but a little silly. :)

0 replies

fuglede · 2023-04-18T11:47:38Z

fuglede
Apr 18, 2023
Author

As a bit of an aside, how much of the solver state would one have to pull out to be able to pick up progress after the process has ended; that is, to run the solver, close the process containing it, open another one a few days later, yet still be able to hot start. I noticed that we have Highs::writeBasis/Highs::readBasis; would those suffice? (I didn't test it since highspy exposes neither; I see that the new version you referenced has readBasis but not writeBasis).

5 replies

jajhall Apr 18, 2023
Maintainer Sponsor

Yes, Highs::writeBasis/Highs::readBasis; would suffice. writeBasisisn't yet in highspy because I've not finished going through the methods in the Highs class deciding what will go into highspy. In the first instance I wasn't going to include everything because I didn't think that there would be demand.

You're much the most advanced user of highspy that I've encountered, and your testing of it has gone way beyond mine.

fuglede Apr 18, 2023
Author

Thanks, good to know that that's possible too. So, I'll definitely vote for having writeBasis as well, but worst case I don't mind writing an extension for the parts I need; Cython does make that fairly straightforward.

You're much the most advanced user of highspy that I've encountered, and your testing of it has gone way beyond mine.

That's not necessarily what you need to hear as someone hoping to use it extensively, but I hope the testing in this thread is useful then. :)

jajhall Apr 18, 2023
Maintainer Sponsor

Since highspy is just a thin wrapper around the C++, sanity checking should be all that's needed. However there have been reports of serious overheads due to pybind11 that we will look at

fuglede Apr 19, 2023
Author

Right, some overhead to be expected, and probably more than one would have from, say, Cython, but I'd be surprised if that ever adds up, if you can just keep the number of calls low (cf. having addCols instead of just addCol). I just tried building it from latest myself and noticed that it doesn't use all compiler optimizations such as, say, -O3, so that's also a place where there might be some room for improvement (and I didn't check how it builds HiGHS itself, which is of course where you really want all the optimizations to come into play (Edit: Just noticed the set(CMAKE_BUILD_TYPE RELEASE) in CMakeLists.txt!)).

fuglede Apr 19, 2023
Author

If you're up for including writeBasis, and you are taking pull requests, I've added one here and included a couple of tests.

jajhall · 2023-04-18T12:13:22Z

jajhall
Apr 18, 2023
Maintainer Sponsor

So, you have LPs where interior point isn't so much better than simplex, and hot start is using dual simplex - as it would if there were any primal infeasibilities when you add the new columns.

0 replies

fuglede · 2023-04-18T12:43:26Z

fuglede
Apr 18, 2023
Author

I just realized I never spelled that out, but my solution indeed remains primal feasible after adding new columns with zero weight.

So I'm still surprised that the hot starting approach is much slower in the intermediate iterations than starting over, even though I stick to using the dual simplex solver in all cases. But perhaps that's expected?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusing work in multiple iterations of a problem solved through column generation #1198

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Reusing work in multiple iterations of a problem solved through column generation #1198

fuglede Mar 9, 2023

Replies: 8 comments · 5 replies

jajhall Mar 21, 2023 Maintainer Sponsor

fuglede Apr 18, 2023 Author

jajhall Apr 18, 2023 Maintainer Sponsor

jajhall Apr 18, 2023 Maintainer Sponsor

fuglede Apr 18, 2023 Author

fuglede Apr 18, 2023 Author

jajhall Apr 18, 2023 Maintainer Sponsor

fuglede Apr 18, 2023 Author

jajhall Apr 18, 2023 Maintainer Sponsor

fuglede Apr 19, 2023 Author

fuglede Apr 19, 2023 Author

jajhall Apr 18, 2023 Maintainer Sponsor

fuglede Apr 18, 2023 Author

fuglede
Mar 9, 2023

Replies: 8 comments 5 replies

jajhall
Mar 21, 2023
Maintainer Sponsor

fuglede
Apr 18, 2023
Author

jajhall
Apr 18, 2023
Maintainer Sponsor

jajhall
Apr 18, 2023
Maintainer Sponsor

fuglede
Apr 18, 2023
Author

fuglede
Apr 18, 2023
Author

jajhall Apr 18, 2023
Maintainer Sponsor

fuglede Apr 18, 2023
Author

jajhall Apr 18, 2023
Maintainer Sponsor

fuglede Apr 19, 2023
Author

fuglede Apr 19, 2023
Author

jajhall
Apr 18, 2023
Maintainer Sponsor

fuglede
Apr 18, 2023
Author