-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load balancer! #3230
load balancer! #3230
Conversation
b6900b4
to
be92048
Compare
only doing this cause python has awful lambdas and can't loop in a meaningful way without doing this
Hey, thank you for working on this! @L-M-Sherlock will you work on implementing Easy Days then? |
I think @jakeprobst is more qualified for the task than I am. Easy Days is based on the load balancer. And it's quiet simple if we have implemented the load balancer: |
Is "avoid siblings" really necessary? I'm somewhat against adding a lot of options; more options means greater cognitive burden for the user and higher chances that the user will be like "Ugh, what is all of this complicated stuff, whatever, I don't want to mess with it" |
I have been trying for the last half an hour to just say that I'm happy this is being implemented but everything sounds so trite so I hope this doesn't. Now coming to load balancing, I agree with Expertium's sentiments. Is there a specific use case for this feature? Also, do you think load balance for a specific preset, accessible from the deck options would be useful? I imagine "Load balance each deck individually" should be able to achieve the same results but for decks with low number of cards, I doubt I will get preset level balancing I want. |
Interference due to siblings (even if they are a few days apart) is an important cause of incorrect grading (and subsequent forgetting) in my case. So, I want to have them as far from each other as possible. However, many people create cards from the same note that don't test the same fact (or reverse). An example would be image occlusion cards. Because of these different use cases, I think the "avoid siblings" option is important (just like we have options to control the burying of siblings). But, if we develop the disperse siblings feature later and find a way to make load balance and disperse work together, this "avoid siblings" option would become redundant. In that case, the users would exert their choice by enabling or disabling the disperse siblings feature. |
I think that balancing per preset makes more sense than balancing per deck, just because FSRS works on a per-preset basis, and it would be confusing if FSRS worked on a per-preset basis but Load Balance didn't. Personally, I would make it so that the user has to manually enable Load Balance for each preset. This sounds a bit inconvenient, but allows for more flexibility and allows us to remove one of the options. Everyone, please keep in mind that Anki is already considered very complex by most people (who aren't long-time Anki users), and we shouldn't add more options unless the benefits very clearly outweigh the extra complexity. And I think that having three options is far too complex. I would remove Avoid Siblings completely and remove "Load Balance each deck individually", but make it possible without explicitly saying so by making Load Balance non-global, so that it has to be enabled for each preset manually. |
I came back now in support of "avoid siblings". I understand load balancing as a "non randomisation of fuzz" and the original purpose of fuzz was to make sure cards are seperated from each other (avoiding interference). As user1283 mentions, for sibling cards, unlike other cards, even a few days of gap is consequential. Thus the further they are, the better. I would like Expertium to explain if such concerns are misplaced or why it shouldn't be a big deal. I sympathise with new users and yes, FSRS should be used by more but this is important actually. There is a risk of interference in getting sibling cards closer to each other in which case this feature is necessary. In other cases, for instance in my deck, most of my cards are siblings to each other but they test you on different things. Having siblings not load balanced would be detrimental in this case as "avoid siblings" would end up avoiding 90 per cent of my cards. I hope my understanding isn't wrong. @Expertium I think you're right about presetlevel balancing. There's also one advantage of it. As cards with intervals above a certain number (90) won't get load balanced, my older decks might not get properly load balanced. In preset level balancing that can still mean I'm doing around the same number of Kanji writing cards daily because... well it's taking the whole preset into account. As for why it's better than collection level balancing is because different type of material take different time. Let's say your Bio decks take you 10 secs per card. Your Chem decks on the other hand take 20 secs. Collection level balancing here can give you different numbers for Bio and Chem. But per preset ensures you're spending almost the same amount of time every day doing your reviews. |
Load Balance can be disabled for siblings, just without making it an option that the user has to think about. Like how you can't control fuzz. There is no option to enable/disable fuzz for some cards, right?
Load Balancing balances number of cards, not time. So if you spend a different amount of time per card for different material, Load Balancer won't know that. |
Yes I'm aware it's not balancing time but that's exactly my point. In my case, Kanji writing cards take a lot of time, that was the reason I thought deck level balancing might come in handy. Well preset level is better though because I have around 13 decks for that. For sibling cards, if they don't get balanced, wouldn't that effect load balancing? Particularly if you have a lot of siblings? My notes can have upto a dozen siblings and more so I'm concerned about that. |
How about doing what we did with Disperse Siblings - add an extra rule that siblings must be at least one day apart from each other? So that Load Balancer can never put them on the same day? Otherwise balance them the same way as other cards |
Extra rule should be that if siblings are coming closer they shouldn't be balanced. Oh, now that sounds workable. Edit: Wait no. There needs to be testing first. Preferably in a deck with a lot of siblings (typical IO cards). |
You are just trying to hide the truth: LB HAS a little impact on retention rate at the present moment (ever so slightly, but still) - users need to know about it.
You're just contradicting yourself - if you against giving information to user about LB in pop up and you're in favor of quiet and non turn-offable adding of LB , then why you need to create page for it??? Stop talking nonsense, please. |
It is also worth to note, that result of open-spaced-repetition/load-balance-simulator#1 investigation were obtained using default weights and 20 new cards limit. There are no guarantees to expect that for user-weights and a different number of new cards per day there is not an even greater regression in retention rate. |
I will put my opinions out there nonetheless.
I haven't much contributed anyway but I wish I could.
And so does fuzz, but the averages are closer to DR. This won't change that picture much.
We would need to update the current documentation of Fuzz, which is something most people haven't and won't read. I am part of the "Anki is too complex" camp so I'll be against a new section. |
@YukiNagat0 I don't know how much you value my opinion, but here it is. Btw, @brishtibheja do you intend to mention the console command in the manual? |
AFAIK console command is temporary so no need to mention that. If there are no obvious issues it will certainly be removed. If there are issues, I wouldn't expect users to know where to read in the manual (studying section > fuzz). |
@jakeprobst would you please test LB with different |
@YukiNagat0 care to provide some hard numbers showing that load balancer will affect retention in a negative way? As it stands people have sufficiently proven it is not a problem at all and I would need real solid evidence of the opposite to feel the need to change things currently.
I'm not sure I can meaningfully measure this, the perf stuff is really about targetting some low end hardware and anything I do on my desktop is not going to be particuarly representative of those systems. Especially since I think its more IO bound than cpu bound because of the sql query than actual load balancer construction? |
@jakeprobst, The thing is that exciting results of synthetic tests (which are shown in open-spaced-repetition/load-balance-simulator#1) are not representing real picture. Again, there are no guarantees, that user specific settings (weights, new card limit, desired retention) will not result in drastic/significant regression in true retention. Why not add checkbox and leave it for about a year, just like it was with Schedular V3? When we get enough feedback from users with real word settings and reviews, then we can remove checkbox. A checkbox won't make things worse, but will allow a person to compare their daily load on the fly without having to go to the console. Not speaking of the fact that literally no one knows about it existence in the first place.
|
Let us not generate more noise. This is turning into yap-olympics, to quote voczi. I would like this to get merged instead and some real testing from the RC users. |
@Expertium, If you are so confident in LB, then can you explain how it will influence my daily load? Spike in red is fresh cards (first interval after graduation). The only thing, that LB can do is to move excessive amount to sunken part. Which will result in offset by 3-4 days (or even to 7-10 days). How do you know, that it won't result in lowering my retention? And, also, how LB will influence on right part (after 35 days) considering the fact, that these are the cards within threshold 90. |
@Expertium I don't want to slow things down on low-end systems / large collections unnecessarily, and don't want to waste CPU resources balancing the distant future, when it's likely to be balanced as it comes closer anyway. @YukiNagat0 I think the others have made a fairly convincing argument for LB not being appreciably different to fuzz when it comes to retention. Every extra option makes the program more complicated, and based on current information, it does not seem to be justified here.
LB uses the same day range as fuzz does, so such a thing would not happen. I believe you are reacting out of fear of the unknown here. If you don't trust what the others are saying, I suggest you give it a try when it reaches a beta, and see if it actually works for you in practice. @jakeprobst thanks for all your work here, I'll try to get to this later today or tomorrow. |
Aside from the comments above, the only other suggestion I have is that a few comments and/or docstrings in load_balancer.rs might be nice. Code looks nice and clean! |
@dae lookin good to go for real this time. properly doing the preset balance now (shows me for only having 1 deck, this was not something I was gonna catch in my personal usage of this). also added a handful of comments explaining a bit how it works. |
Thanks for all your work on this Jake, and for everyone who contributed to the design discussions. |
* start of load balancer * add configuration options; option to load balance per deck * formatting * clippy * add myself to contributors * cleanup * cargo fmt * copyright header on load_balancer.rs * remove extra space * more formatting * python formatting * ignore this being None only doing this cause python has awful lambdas and can't loop in a meaningful way without doing this * only calculate notes on each day if we are trying to avoid siblings * don't fuzz intervals if the load balancer is enabled * force generator to eval so this actually happens * load balance instead of fuzzing, rather than in addition to * use builtin fuzz_bounds rather than reinvent something new * print some debug info on how its load balancing * clippy * more accurately load balance only when we want to fuzz * incorrectly doublechecking the presence of the load balancer * more printfs for debugging * avoid siblings -> disperse siblings * load balance learning graduating intervals * load balancer: respect min/max intervals; graduating easy should be at least +1 good * filter out after-days under minimum interval * this is an inclusive check * switch load balancer to caching instead of on the fly calculation * handle case where load balancer would balance outside of its bounds * disable lb when unselecting it in preferences * call load_balancer in StateContext::with_review_fuzz instead of next to * rebuild load balancer when card queue is rebuilt * remove now-unused configuration options * add note option to notetype to enable/disable sibling dispersion * add options to exclude decks from load balancing * theres a lint checking that the link actually exists so I guess I'll add the anchor back in later? * how did I even update this * move load balancer to cardqueue * remove per-deck balancing options * improve determining whether to disperse siblings when load balancing * don't recalculate notes on days every time * remove debug code * remove all configuration; load balancer enabled by default; disperse siblings if bury_reviews is set * didn't fully remove caring about decks from load balancer sql query * load balancer should only count cards in the same preset * fuzz interval if its outside of load balancer's range * also check minimum when bailing out of load balancer * cleanup; make tests happy * experimental weight-based load balance fuzzing * take into account interval when weighting as it seems to help * if theres no cards the interval weight is just 1.0 * make load balancer disableable through debug console * remove debug prints * typo * remove debugging print * explain a bit how load balancer works * properly balance per preset * use inclusive range rather than +1 * -1 type cast * move type hint somewhere less ugly; fix comment typo * Reuse existing deck list from parent function (dae) Minor optimisation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it took quite some work. Thank you @jakeprobst for that. 🙂
Could it be that this pull request introduced two type errors? I'm getting these two errors from mypy
:
pylib/anki/collection.py:976: error: "type[Bool]" has no attribute "LOAD_BALANCER_ENABLED" [attr-defined]
pylib/anki/collection.py:979: error: "type[Bool]" has no attribute "LOAD_BALANCER_ENABLED" [attr-defined]
These type errors aren't present if you checkout the commit before this pull request was merged and run mypy
. So they must've been introduced by this pull request. Or is this a misconfiguration on my personal system?
def _get_enable_load_balancer(self) -> bool: | ||
return self.get_config_bool(Config.Bool.LOAD_BALANCER_ENABLED) | ||
|
||
def _set_enable_load_balancer(self, value: bool) -> None: | ||
self.set_config_bool(Config.Bool.LOAD_BALANCER_ENABLED, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this change, I get these two type errors from mypy
:
pylib/anki/collection.py:976: error: "type[Bool]" has no attribute "LOAD_BALANCER_ENABLED" [attr-defined]
pylib/anki/collection.py:979: error: "type[Bool]" has no attribute "LOAD_BALANCER_ENABLED" [attr-defined]
Do you get them too @jakeprobst? Or is this a misconfiguration on my personal system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regen your protos (probably)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I do that?
Edit: I found out how. In case someone else discovers this comment, I documented how you generate the required Python code from the *.proto
files in #3390.
This is something of a large feature, and as such I do not expect this to be merged without a good amount of discussion/iteration. But I hope that showing up with some code and an intent to see it through will start a discussion on how to fully implement this and get it in to Anki proper.
This is mostly based on the old load balancer addon from days of yore.
Differences from the original addon:
In the relevant issue #3116 there is mention of a performance optimization of precalculating all the due date values ahead of time. For fear of cache suffering I opted to calcuate all values on the fly. Its a handful of small sql queries (three queries per card, three ints per row, at most each returning a few thousand rows for those with intense anki workflow habits) on every card answer and I would be surprised if any system that can run anki would have issues with it (if this is not the case I can totally address this, but I am very unsure of its necessity and it would drastically increase the complexity). The original addon did the same queries and I never once got a complaint about performance. I did implement it in AnkiDroid for my personal use (with accompanying PR that went nowhere) and my low-end-in-2014 phone had no issues with a workload of ~150 cards a day.
Possible issues:
Forum thread: https://forums.ankiweb.net/t/native-implementation-of-load-balance/46007