Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expt/siblings #142

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Expt/siblings #142

wants to merge 6 commits into from

Conversation

L-M-Sherlock
Copy link
Member

No description provided.

@Expertium
Copy link
Contributor

As I said on Discord:

I meant 4 parameters for outputs of 4 functions: D, short-term S, S (success) and PLS

@user1823
Copy link
Contributor

user1823 commented Dec 26, 2024

Using the reviews of a card to adjust the memory states of its siblings is interesting. Waiting for the results.

Also curious about the method KAR3L (https://github.com/Pinafore/karl-flashcards-web-app) used to update the recall probability of related cards. That method might also be worth trying to update memory states of siblings in FSRS. (Though we can't identify all the related cards like KAR3L can, we can atleast identify the siblings.)

(By method, I mean the mathematical function.)

@L-M-Sherlock
Copy link
Member Author

Here is the result:

Model: FSRS-5-dev
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3270±0.1525
FSRS-5-dev RMSE(bins) (mean±std): 0.0507±0.0325
FSRS-5-dev AUC (mean±std): 0.7048±0.0759

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3529±0.1697
FSRS-5-dev RMSE(bins) (mean±std): 0.0702±0.0459
FSRS-5-dev AUC (mean±std): 0.7022±0.0867

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3563±0.1724
FSRS-5-dev RMSE(bins) (mean±std): 0.0732±0.0480
FSRS-5-dev AUC (mean±std): 0.7012±0.0888

parameters: [0.4469, 1.1877, 3.117, 15.691, 7.1265, 0.5157, 1.8096, 0.0099, 1.5118, 0.1426, 1.0036, 1.9168, 0.1062, 0.3007, 2.3378, 0.2321, 2.9899, 0.4549, 0.6006, 0.0128, 0.0964, 0.0, 0.0]

Model: FSRS-5
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3276±0.1526
FSRS-5 RMSE(bins) (mean±std): 0.0518±0.0333
FSRS-5 AUC (mean±std): 0.7010±0.0786

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3534±0.1696
FSRS-5 RMSE(bins) (mean±std): 0.0713±0.0462
FSRS-5 AUC (mean±std): 0.6995±0.0887

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3568±0.1721
FSRS-5 RMSE(bins) (mean±std): 0.0742±0.0479
FSRS-5 AUC (mean±std): 0.6986±0.0908

parameters: [0.4299, 1.162, 3.1897, 15.8179, 7.1441, 0.5397, 1.7835, 0.0104, 1.5175, 0.1351, 1.0064, 1.9183, 0.1007, 0.3016, 2.3446, 0.2315, 3.0117, 0.4463, 0.635]

@user1823
Copy link
Contributor

So, the impact is unfortunately very small, even lesser than the impact of recency weighting.

Now, I am even more curious about KAR3L. Maybe trying a similar method for updating memory states of siblings in FSRS would yield better results?

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Dec 26, 2024

@user1823
Copy link
Contributor

user1823 commented Dec 26, 2024

Does this mean that KAR3L is not using specific formulas but something like a neural network? If so, we can't take any inspiration from KAR3L. ☹️

@user1823
Copy link
Contributor

BTW, what if we allow GRU-P to use the new data (containing reviews of siblings)? That could tell us how much improvement we can expect if we were somehow able to come up with a great formula.

@Expertium
Copy link
Contributor

@L-M-Sherlock what would the name be? GRU-P-Sibling? 😆

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Dec 27, 2024

The result is not promising:

Model: GRU-P-siblings
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P-siblings LogLoss (mean±std): 0.3244±0.1509
GRU-P-siblings RMSE(bins) (mean±std): 0.0428±0.0288
GRU-P-siblings AUC (mean±std): 0.7033±0.0804

Weighted average by log(reviews):
GRU-P-siblings LogLoss (mean±std): 0.3493±0.1672
GRU-P-siblings RMSE(bins) (mean±std): 0.0605±0.0417
GRU-P-siblings AUC (mean±std): 0.6918±0.0928

Weighted average by users:
GRU-P-siblings LogLoss (mean±std): 0.3525±0.1696
GRU-P-siblings RMSE(bins) (mean±std): 0.0632±0.0437
GRU-P-siblings AUC (mean±std): 0.6893±0.0952

Model: GRU-P
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P LogLoss (mean±std): 0.3251±0.1508
GRU-P RMSE(bins) (mean±std): 0.0433±0.0288
GRU-P AUC (mean±std): 0.6991±0.0812

Weighted average by log(reviews):
GRU-P LogLoss (mean±std): 0.3491±0.1666
GRU-P RMSE(bins) (mean±std): 0.0606±0.0413
GRU-P AUC (mean±std): 0.6889±0.0926

Weighted average by users:
GRU-P LogLoss (mean±std): 0.3521±0.1689
GRU-P RMSE(bins) (mean±std): 0.0633±0.0433
GRU-P AUC (mean±std): 0.6868±0.0946

@user1823
Copy link
Contributor

The result is not promising:

This suggests that we shouldn't focus on siblings (assuming that there wasn't any bug in the benchmark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants