-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bars notebook - possible corrections #9
Comments
@gahobbsau and @BlackArbsCEO: Have you tried playing with the different runs-bars? I have problems keeping their dynamics from being very unstable; actually I also have that for the imbalance bars. As I read the text, the expectation values for T (number of ticks in a bar), the imbalance or run lengths, can just be estimated as exponentially weighted moving averages of the exact same properties measured on previous bars. Hence E_0[T] is nothing but the exponential moving average of the number of ticks in the preceding bars. Likewise when it comes to the imbalance or runs lengths, they are just the exponentially weighted average of imbalance or run lengths from preceding bars, measured in terms of imbalance or run lengths per number of ticks. The latter is what makes them "probabilities" and is the reason it makes sense to multiply them with E_0[T], to get an estimate of what the theta should be for the next bar. It is a slightly different understanding of the text than what I think you have implemented, @BlackArbsCEO, but it could easily be me who haven't understood the text, your implementation or maybe both :) The problem I run in to, comes in different flavors, depending on the decay I choose for the exponentially moving averages. I get the best results for very low decays, however, when suddenly a large imbalance or extreme run appears the changes on theta have a severe effect for a very long time. For rare occasions, this ends up in a strong feedback loop with bars that within 10 steps (10 bars) goes from consisting of 50-100 ticks to suddenly 200000 ticks. Have you experienced something similar? I would like to share my code with you if you are interested, but I am not sure what is the appropriate way of doing that here. Please let me know. |
In the supplied requirements.txt file: change sklearn>=0.19.1 to scikit-learn>=0.19.1. I created a clean new Conda Environment for these Adv_Fin_ML_Exercises Notebooks. ( Created using Visual Studio, though creating using Anaconda Navigator produces and environment with the same limited set of packages.) Then pip installed the supplied requirements.txt file (found in the top folder) and conda installed Jupyter Notebook and JupyterLab packages into this environment. I found that upon running the Initialisation block of code, the following packages were reported as missing and could be added to the requirements.txt file: After this upon running warnings were generated in relation to the theano package, with some recommended conda installs which I duly applied to the environment. |
thanks guys for the updates. I'll review the proposed changes and try to incorporate them or reply within a few days. @aldebaransearch you can create a pull request which has your notebook and scripts as it pertains to the topic. I'll review it, and assuming everything is good I'll approve the merge |
This great article on de Prado's handling of bars came last week: |
@flamby thanks for sharing, the linked article looks informative and well done. |
In my work running the code in the Bars notebook, I have found that the following possibly required corrections in the code.
At Volume Bars, 2nd block
At > v_bar_df = volume_bar_df(df, 'v', 'price', volume_M)
Corrected to > v_bar_df = volume_bar_df(df, 'v', volume_M)
ERROR message: TypeError: volume_bar_df() takes 3 positional arguments but 4 were given
At Dollar Value Bars, 1st block
def dollar_bars()
At > t = df[column]
Corrected to > t = df[dv_column]
Reason: to match argument name
At Dollar Value Bars, 2nd block
At > dv_bar_df = dollar_bar_df(df, 'dv', 'price', dollar_M)
Corrected to: dv_bar_df = dollar_bar_df(df, 'dv', dollar_M)
ERROR message: TypeError: volume_bar_df() takes 3 positional arguments but 4 were given
Initialisation block
In my environment, at > import pandas_datareader.data as web
After "import pandas as pd" I had to add the following line before the datareader line.
pd.core.common.is_list_like = pd.api.types.is_list_like
ERROR message: python datareader cannot import name 'is_list_like'
CHANGE made as per Answer 55 at https://stackoverflow.com/questions/50394873/import-pandas-datareader-gives-importerror-cannot-import-name-is-list-like
Suggestions:
a) QtConsole: I have found it very helpful to add the following line to the Initilisation block so as to open a QtConsole in the current notebook kernel for interactive work without cluttering the notebooks.
%qtconsole
For example, I saved a lot of the variable datasets to csv files for inspection; also checking dtypes and the like.
b) Variable Inspector for Notebooks: Install jupyter_contrib_nbextensions (and the jupyter_nbextensions_configurator)
c) Switch plots between inline and interactive: In the QtConsole, switch between the two with:
%matplotlib qt > #interactive plotting in separate window
%matplotlib inline > #normal charts inside notebooks
utils conflict
a) In my environment, I also found that I needed to rename the file utils both in the src folder and in the Notebook.
I trust that these may assist.
Thank you greatly for the sharing your implementations in the 2 notebooks. It has assisted greatly in understanding and contributes towards the possibility of applying the work from the book.
On reading discussion in some of the other comments, I am encouraged to find that I am not the only one who finds some of the notation obscure or ambiguous.
The text was updated successfully, but these errors were encountered: