Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiPeriodOptimization Behavior Documentation #117

Open
pcgm-team opened this issue Nov 30, 2023 · 15 comments
Open

MultiPeriodOptimization Behavior Documentation #117

pcgm-team opened this issue Nov 30, 2023 · 15 comments
Labels
docs Improve documentation examples Improve examples question

Comments

@pcgm-team
Copy link

With user provided returns forecast, with a single forecasted return per timestamp- how does the model use the forecast?

Is it looking at the next planning_horizon bars of forecast on that day (lookahead-biasing if you generated the forecast with data upto that timestamp); or is it persisting the same forecast for planning_horizon additional bars of optimization.

I've read the docs and paper, but the way it is implemented is very unclear.

@enzbus enzbus added question docs Improve documentation examples Improve examples labels Nov 30, 2023
@enzbus
Copy link
Collaborator

enzbus commented Nov 30, 2023

EDIT: I hadn't read your comment well. If you do nothing it simply carries over the value for the day (no look-ahead).

Hello, I also need to improve the docs on this. There are two ways to do it. One is to build a MPO policy by providing a list of objectives and a list of lists of constraints:

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal_today_dataframe) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal_tomorrow_dataframe) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

so the planning_horizon argument is unused. In this way you can change all the terms you want for each MPO step.

The other, in case you simply have slow and fast signals, is to use the decay argument of ReturnsForecast

mpo_policy = cvx.MultiPeriodOptimization(
    objective = cvx.ReturnsForecast(fast_signal_dataframe, decay = .2)
              + cvx.ReturnsForecast(slow_signal_dataframe, decay = .8)
              - gamma_risk * cvx.FullCovariance() , 
    constraints = [cvx.LeverageLimit(3)],
    planning_horizon = 5,
)

Let me know if this answers your question.

@pcgm-team
Copy link
Author

Thank you that clarifies perfectly. Very helpful.

@enzbus enzbus reopened this Nov 30, 2023
@enzbus
Copy link
Collaborator

enzbus commented Nov 30, 2023

I need to improve the docs for this; it might take a while so at least if someone comes here to ask something similar they find this.

@pcgm-team
Copy link
Author

pcgm-team commented Dec 1, 2023

One more question about the indexing of signal_today_dataframe and signal_tomorrow_dataframe in

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal_today_dataframe) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal_tomorrow_dataframe) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

Should the index be the dates of the prediction or the date it is predicting?
Ie lets say I have a dataframe with:
t symbol pred_{t+1} pred_{t+2} pred_{t+3}

Is signal_tomorrow_dataframe going to be
t pred_{t+2} (so the index is the date the prediction was made)
or should it be:
t+1 pred_{t+2}
Or even
t pred_t

@enzbus
Copy link
Collaborator

enzbus commented Dec 1, 2023

The timestamp refers to the time of each period in the back-test, say 9:30am EST on a Monday. Then signal today, at timestamp 9:30am Monday, is the prediction of the return from 9:30am Monday to 9:30am Tuesday, signal tomorrow, at timestamp 9:30am Monday, is prediction of return from Tuesday to Wednesday, .... The time convention is the one defined in the paper, section 2. In practice you can assume that signal for today is built knowing all data up to the open price of today (and the open-to-open total return from yesterday open to today open). Does it make sense? I should definitely make these things clearer in the docs. The policy objects receive a view (past_returns) of the open-to-open total returns up to the open of the day, as a dataframe. ReturnsForecast without arguments simply computes a .mean() of that, so each day it does the full mean of all past returns for each name. In the user provided forecasters example you see how you can use the same model to do arbitrary forecasting.

@pcgm-team
Copy link
Author

pcgm-team commented Dec 1, 2023

so if I have a dataframe of signal_day_after_tomorrow (indices are the date the signals are created) I should shift it forward by two when I feed it into the objective?

@enzbus
Copy link
Collaborator

enzbus commented Dec 1, 2023

I think in your formalism (comment before) it's t+1 pred t+1 for signal today, and so on. The timestamp in the signal dataframe is such that the prediction at that timestamp is done using all data up to the price at that timestamp. For signals for the future it's the same, but you predict a future quantity.

@pcgm-team
Copy link
Author

pcgm-team commented Dec 1, 2023

I'm still not 100% sure, but it sounds like basically if I got signal1 and and signal2 with index being the index outputted by a regression (so the date at which we predict target return of next day for signal1, and then that target shifted back by 1 for signal2); the correct way to use the optimizer would be something like:

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal1.shift(1)) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal2.shift(2)) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

I suppose what I'm confused about is whether it would instead be

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal1) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal2.shift(1)) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

@enzbus
Copy link
Collaborator

enzbus commented Dec 2, 2023

You've got to think about the way data is consumed by your machine learning model that produces the signal. That's why you can take the user provided forecasters example as a starting point https://github.com/cvxgrp/cvxportfolio/blob/master/examples/user_provided_forecasters.py .

The line of your signal that has timestamp t, must be built with data that was available at time t. That's the case for all forecasted quantities: returns for the period, for the next period, volumes, risk model parameters, .... (If you don't have that property you're doing look-ahead and any analysis is invalid.)

@enzbus
Copy link
Collaborator

enzbus commented Feb 28, 2024

An improved explanation of the multi-period optimization model was just merged in master, can be seen on the development version of the docs https://www.cvxportfolio.com/en/master/manual.html#multi-period-optimization

@miller-moore
Copy link

Hi @enzbus, I'm trying to generate my own MPO backtest example since the one in the docs is not yet complete. Currently I am getting a MissingTimesError that isn't caused by a timezone mismatch. In my case it is probably a misspecification between the ReturnsForecast dataframes and my user provided market data. In any case, do you have an estimate for when the example may be complete? I could work up an MRE if you think there could be a bug in the loop over trading calendar indices. Thanks

@enzbus
Copy link
Collaborator

enzbus commented Jun 4, 2024

Does the section in the manual -> https://www.cvxportfolio.com/en/stable/manual.html#multi-period-optimization or this discussion-> #139 help? Generally what people have trouble with is making sure indexing is done by the time of execution (in the back-test sense), not the time of the prediction.

@miller-moore
Copy link

miller-moore commented Jun 4, 2024

Yes I have read the full docs and the full paper. My forecast dataframes and market-data dataframes are both indexed by open timestamps. The forecast values corresponds to the forecasted return in the periods that follow while the market data returns dataframe contains the actual returns in the periods that follow.

For example, by my formalism with a planning period of 3, signal_today_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t and open_t+1, signal_next_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t+1 and open_t+2, and finally signal_next_next_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t+2 and open_t+3.

Regarding market data, my returns dataframe also has index _open_t and the return value corresponds to the actual return between open_t and open_t+1 and its index (open_t) is identical to each forecast dataframe index. Is this structure the correct expectation per the API and is it true that all forecasted dataframes need to have an identical index to each market-data dataframe?

My desire for an example with data is simply that it could help clear up any ambiguity in the expectation of mutual structure between the forecasts and market-data but answers to the above should be enough to get me there eventually.

Thanks

@enzbus
Copy link
Collaborator

enzbus commented Jun 4, 2024

Thanks for the clarifications, and yes that is indeed the correct format. I don't have an ETA for the complete restoration of the 2016 example notebooks, but if you paste a trace-back of the specific error you're getting I might help.

@miller-moore
Copy link

I will generate an MRE if I continue to have trouble. Thank you for the verification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improve documentation examples Improve examples question
Projects
None yet
Development

No branches or pull requests

3 participants