-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tune the sample size for CMRR.py #136
Conversation
So here's the idea: the amount of time it takes to run CMRR for small values of days_to_simulate is very small. Which means that we can increase the sample size without having to worry about CMRR taking too long to annoy the user. Instead of writing a whole bunch of else-if statements, I made a function that calculates the "best" sample size. Specifically, it calculates what sample size would make CMRR run roughly as long as it owuld with sample_size=4 and days_to_simulate=365. sample_size=4 and days_to_simulate=365 is used as the default, and then if days_to_simulate is lower, sample size is increased to make the total run time approximately constant.
Do you verify it? |
Only "on paper". I can't verify it without having a version of Anki that actually implements it. |
OK. I will test it later. |
I tested it in Python and it seems that my approximation provides a decent fit, but it could be better. I improved the coefficients. |
Here are the times with the new method for calculating the sample size: |
I don't know how I managed to delete "best" twice today
Ok, I changed the coefficients and the output value for I'm sure that this is ready to be merged now. |
Well, give me the raw data again. And make it start from 0.50. |
If I revert this PR, the width of the green area will be larger than current one, right? If so, I won't revert it. |
Alright. Release a new version of the Google Colab optimizer so I can test and see if graphs look ok. |
You can modify this code to install fsrs-optimizer from your GitHub repo and branch.
Reference: https://stackoverflow.com/questions/20101834/pip-install-from-git-repo-branch |
The raw data:
|
Don't mind me, I'm just trying things
Alright, should be good, you can merge it. Don't forget to add the new method of calculating the sample size to the Rust version. |
Ensure that nothing is plotted above the box with the graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
aef1af1
So here's the idea: the amount of time it takes to run CMRR for small values of days_to_simulate is very small. Which means that we can increase the sample size without having to worry about CMRR taking too long to annoy the user.
Instead of writing a whole bunch of else-if statements, I made a function that calculates the "best" sample size. Specifically, it calculates what sample size would make CMRR run roughly as long as it would with sample_size=4 and days_to_simulate=365.
sample_size=4 and days_to_simulate=365 is used as the default, and then if days_to_simulate is lower, sample size is increased to make the total run time approximately constant for days_to_simulate <=365.
You can run something like:
to see what sample size values it outputs for each value of days_to_simulate.
Don't forget to add this to FSRS-rs and merge it into Anki.