backtest() works, but backtest_many() gives MemoryError #177

quant5 · 2024-09-03T15:44:58Z

Specifications

Windows machine (x86). Python 3.11.5.
Fresh install of cvxportfolio to the new version (1.3.2) from a fresh virtual environment.

Description

I try to run the hello_world.py file here: https://github.com/cvxgrp/cvxportfolio/blob/master/examples/hello_world.py

the script gets data correctly and the data files are fine.
the script fails on simulator.backtest_many([policy, cvx.Uniform()], start_time="2020-01-01"):
- this is very strange since my machine is quite large (64gb RAM) and the memory allocation in the error is only 1mb.

If I replace backtest_many with simple backtest(), script succeeds. Example: simulator.backtest(policy, start_time="2020-01-01")orsimulator.backtest(cvx.Uniform(), start_time="2020-01-01")`

Other remarks

Switching the solver to something else e.g., "ECOS" or "CLARABEL" did not fix the issue.
Downgrading numpy to 1.5.x did not fix the issue.
Task manager does not indicate any huge spike in memory use.
The test suite (e.g., python -m cvxportfolio.tests) worked fine. (see below)

My thought is that multiprocess compatibility with Numpy could be the issue? But not seeing anything from a quick search. Any help would be appreciated.

The text was updated successfully, but these errors were encountered:

enzbus · 2024-09-03T16:22:46Z

Thanks for reporting this. Yes, it looks like an incompatibility between multiprocess and something on your system. Looks like you're on Windows; in the worst case you can try to switch to a Linux virtual machine. I'll work on adding an option to revert to using standard Python multiprocessing, since a few releases ago multiprocess is not necessary any more, but I left it because it looked more robust. Can you try in the next ~24hrs if a patch that drops multiprocess from the dependencies and moves it to an optional dependency (if installed, use it, otherwise no) fixes this issue for you? PS I assume everything works fine if you set parallel=False in the options to backtest_many?

quant5 · 2024-09-03T16:50:28Z

Of course, I am experimenting with trying this on my end too, but if you push out a patch I'll happily try it.

quant5 · 2024-09-03T16:52:25Z

And yes - no problem with parallel=False as expected. I should have put that in the original issue.

enzbus · 2024-09-03T16:53:54Z

Ok, working on it in PR #178

quant5 · 2024-09-03T17:00:58Z

Simply replacing multiprocess to multiprocessing here
https://github.com/cvxgrp/cvxportfolio/blob/master/cvxportfolio/simulator.py#L53

did not work on my end:

I am digging into the code further.

enzbus · 2024-09-03T17:09:49Z

Looks like the fix passes tests, I'm going to merge and then you can try by installing the development version (https://www.cvxportfolio.com/en/master/#advanced-install-development-version); not sure which environment manager you're using but the syntax should be similar to pip's. It's rather strange, I had made a test case specifically for multiprocessing; maybe it passes because GitHub only provides single-processor test runners? Let me merge and we see.

enzbus · 2024-09-03T17:12:44Z

Ok, in any case I've merged (always good to minimize dependencies, multiprocess was there for advanced usecases, like custom forecasters using difficult third party libraries).

enzbus · 2024-09-03T17:20:20Z

Just a thought. Perhaps @quant5 you have some Windows-specific
limit on processes spawned by a master process? Could be that GitHub test runners are sanitized against this sort of things. I recently found out new versions of MacOS have all sorts of limits on "unsigned" codes and you need to manually unset a bunch of code attributes to get many open source codes running. In any case I'm happy with having dropped a depencency :)

quant5 · 2024-09-03T17:33:46Z

I reverse engineered the call stack and got something to work. It turns out to be simple: setting processes=n within multiprocessing.Pool.

import pandas as pd

import cvxportfolio as cvx
from multiprocessing import Lock, Pool
from cvxportfolio.cache import _load_cache, _mp_init, _store_cache


def _worker(policy, simulator, start_time, end_time, h):
    return simulator._backtest(policy, start_time, end_time, h)


if __name__ == "__main__":
    # risk aversion parameter (Chapter 4.2)
    # chosen to match resulting volatility with the
    # uniform portfolio (for illustrative purpose)
    GAMMA = 2.5

    # covariance forecast error risk parameter (Chapter 4.3)
    # this can help regularize a noisy covariance estimate
    KAPPA = 0.05

    objective = (
        cvx.ReturnsForecast()
        - GAMMA * (cvx.FullCovariance() + KAPPA * cvx.RiskForecastError())
        - cvx.StocksTransactionCost()
    )

    constraints = [cvx.LeverageLimit(3)]
    universe = ["AAPL", "AMZN", "UBER", "ZM", "CVX", "TSLA", "GM", "ABNB", "CTAS", "GOOG"]

    policy = cvx.MultiPeriodOptimization(objective, constraints, planning_horizon=2)
    simulator = cvx.StockMarketSimulator(universe=universe)

    policies = [policy, cvx.Uniform()]
    initial_value = 1e6
    market_data = simulator.market_data
    tz = market_data.trading_calendar().tz

    start_time = pd.Timestamp("2020-01-01").tz_localize(tz)
    end_time = pd.Timestamp("2021-01-01").tz_localize(tz)

    trading_calendar_inclusive = market_data.trading_calendar(
        start_time, end_time, include_end=True
    )
    if len(trading_calendar_inclusive) < 1:
        raise ValueError("There are no trading days between the provided times.")
    start_time_t = trading_calendar_inclusive[0]
    end_time_t = trading_calendar_inclusive[-1]

    initial_universe = market_data.universe_at_time(start_time_t)
    h = [None] * len(policies)
    for i in range(len(policies)):
        if h[i] is None:
            h[i] = pd.Series(0.0, initial_universe)
            h[i].iloc[-1] = initial_value

    n = len(policies)
    zip_args = zip(policies, [simulator] * n, [start_time] * n, [end_time] * n, h)

    with Pool(processes=n, initializer=_mp_init, initargs=(Lock(),)) as p:
        result = p.starmap(_worker, zip_args)

    print(list(result))

quant5 · 2024-09-03T17:37:53Z

Obviously not sure if this is specific to my machine, Windows, etc. but if it makes it into any fix (e.g., a num_processes kwarg that can be set or default to cpu_count()), let me know, otherwise I will work with setting up my own pool like this.

As a side note, removing the initializer=_mp_init, initargs=(Lock(),) part on the Pool callsite, the example still works. I don't know enough about multiprocessing etc. so just curious why that is necessary.

enzbus · 2024-09-03T17:41:04Z

Sounds reasonable, I'll delve into the docs of multiprocessing just to be sure but it seems an innocuous addition. Feel free to do a PR yourself if you have it worked out. (Looks like you do.) Thanks!

This minor release contains various new features and fixes. Features: - new constraints Min/MaxHoldings, Min/MaxTradeWeights, Min/MaxTrades, FixedImbalance and NoCash (GH issue #180); - improved ParticipationRateLimit constraint; - improved exception reporting, now giving full path in evaluation tree where exception was raised (GH PR #176); - redesigned forecast.py, no API changes, still work in progress for full support for regularized regression; - added market_data = None option in Policy.execute; now Cvxportfolio policies can be executed (e.g., for online usage) without a MarketData server; all data needs to be provided separately to each individual object; - AnnualizedVolatility utility object, for usage in risk constraints; - added reject_trades_below and max_fraction_liquidity options to MarketSimulator, allowing to filter both small and too large trades (in simulation, using realized daily volumes); - minor updates in examples; - a few new sections in documentation manual; - changed license from APACHE2 to GPLv3, see explanation in GH issue #166; - moved documentation website to pydata-sphinx theme, and redesigned it a little; Fixes: - GH issue #146, cache files invalidatation on user interrupt; - GH issue #177, now using default Python multiprocessing, moved; multiprocess to optional dependency; - various smaller ones. This release took a bit longer than the usual 2-3 months. We hope to release 1.5.0 in about 2-3 months from now.

enzbus added the bug label Sep 3, 2024

quant5 mentioned this issue Sep 3, 2024

add optional n_processes kw to backtest_many #179

Open

enzbus closed this as completed in 31663bd Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backtest() works, but backtest_many() gives MemoryError #177

backtest() works, but backtest_many() gives MemoryError #177

quant5 commented Sep 3, 2024 •

edited

Loading

enzbus commented Sep 3, 2024 •

edited

Loading

quant5 commented Sep 3, 2024

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

enzbus commented Sep 3, 2024

enzbus commented Sep 3, 2024 •

edited

Loading

quant5 commented Sep 3, 2024 •

edited

Loading

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

backtest() works, but backtest_many() gives MemoryError #177

backtest() works, but backtest_many() gives MemoryError #177

Comments

quant5 commented Sep 3, 2024 • edited Loading

Specifications

Description

Other remarks

enzbus commented Sep 3, 2024 • edited Loading

quant5 commented Sep 3, 2024

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

enzbus commented Sep 3, 2024

enzbus commented Sep 3, 2024 • edited Loading

quant5 commented Sep 3, 2024 • edited Loading

quant5 commented Sep 3, 2024

enzbus commented Sep 3, 2024

quant5 commented Sep 3, 2024 •

edited

Loading

enzbus commented Sep 3, 2024 •

edited

Loading

enzbus commented Sep 3, 2024 •

edited

Loading

quant5 commented Sep 3, 2024 •

edited

Loading