Expt/siblings #142

L-M-Sherlock · 2024-12-25T10:14:25Z

No description provided.

Expertium · 2024-12-25T10:17:00Z

As I said on Discord:

I meant 4 parameters for outputs of 4 functions: D, short-term S, S (success) and PLS

user1823 · 2024-12-26T05:07:01Z

Using the reviews of a card to adjust the memory states of its siblings is interesting. Waiting for the results.

Also curious about the method KAR3L (https://github.com/Pinafore/karl-flashcards-web-app) used to update the recall probability of related cards. That method might also be worth trying to update memory states of siblings in FSRS. (Though we can't identify all the related cards like KAR3L can, we can atleast identify the siblings.)

(By method, I mean the mathematical function.)

L-M-Sherlock · 2024-12-26T05:35:21Z

Here is the result:

Model: FSRS-5-dev
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3270±0.1525
FSRS-5-dev RMSE(bins) (mean±std): 0.0507±0.0325
FSRS-5-dev AUC (mean±std): 0.7048±0.0759

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3529±0.1697
FSRS-5-dev RMSE(bins) (mean±std): 0.0702±0.0459
FSRS-5-dev AUC (mean±std): 0.7022±0.0867

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3563±0.1724
FSRS-5-dev RMSE(bins) (mean±std): 0.0732±0.0480
FSRS-5-dev AUC (mean±std): 0.7012±0.0888

parameters: [0.4469, 1.1877, 3.117, 15.691, 7.1265, 0.5157, 1.8096, 0.0099, 1.5118, 0.1426, 1.0036, 1.9168, 0.1062, 0.3007, 2.3378, 0.2321, 2.9899, 0.4549, 0.6006, 0.0128, 0.0964, 0.0, 0.0]

Model: FSRS-5
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3276±0.1526
FSRS-5 RMSE(bins) (mean±std): 0.0518±0.0333
FSRS-5 AUC (mean±std): 0.7010±0.0786

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3534±0.1696
FSRS-5 RMSE(bins) (mean±std): 0.0713±0.0462
FSRS-5 AUC (mean±std): 0.6995±0.0887

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3568±0.1721
FSRS-5 RMSE(bins) (mean±std): 0.0742±0.0479
FSRS-5 AUC (mean±std): 0.6986±0.0908

parameters: [0.4299, 1.162, 3.1897, 15.8179, 7.1441, 0.5397, 1.7835, 0.0104, 1.5175, 0.1351, 1.0064, 1.9183, 0.1007, 0.3016, 2.3446, 0.2315, 3.0117, 0.4463, 0.635]

user1823 · 2024-12-26T05:51:49Z

So, the impact is unfortunately very small, even lesser than the impact of recency weighting.

Now, I am even more curious about KAR3L. Maybe trying a similar method for updating memory states of siblings in FSRS would yield better results?

L-M-Sherlock · 2024-12-26T06:22:02Z

The hidden state size is 768...

Here is the code:

https://github.com/Pinafore/fact-repetition/blob/2bdcb801232dab5362610fa8297119dc5e9876f2/karl/retention_phase1/model_distilbert.py#L18-L36

user1823 · 2024-12-26T07:12:17Z

Does this mean that KAR3L is not using specific formulas but something like a neural network? If so, we can't take any inspiration from KAR3L. ☹️

user1823 · 2024-12-26T11:52:36Z

BTW, what if we allow GRU-P to use the new data (containing reviews of siblings)? That could tell us how much improvement we can expect if we were somehow able to come up with a great formula.

Expertium · 2024-12-26T11:54:44Z

@L-M-Sherlock what would the name be? GRU-P-Sibling? 😆

L-M-Sherlock · 2024-12-27T08:15:19Z

The result is not promising:

Model: GRU-P-siblings
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P-siblings LogLoss (mean±std): 0.3244±0.1509
GRU-P-siblings RMSE(bins) (mean±std): 0.0428±0.0288
GRU-P-siblings AUC (mean±std): 0.7033±0.0804

Weighted average by log(reviews):
GRU-P-siblings LogLoss (mean±std): 0.3493±0.1672
GRU-P-siblings RMSE(bins) (mean±std): 0.0605±0.0417
GRU-P-siblings AUC (mean±std): 0.6918±0.0928

Weighted average by users:
GRU-P-siblings LogLoss (mean±std): 0.3525±0.1696
GRU-P-siblings RMSE(bins) (mean±std): 0.0632±0.0437
GRU-P-siblings AUC (mean±std): 0.6893±0.0952

Model: GRU-P
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P LogLoss (mean±std): 0.3251±0.1508
GRU-P RMSE(bins) (mean±std): 0.0433±0.0288
GRU-P AUC (mean±std): 0.6991±0.0812

Weighted average by log(reviews):
GRU-P LogLoss (mean±std): 0.3491±0.1666
GRU-P RMSE(bins) (mean±std): 0.0606±0.0413
GRU-P AUC (mean±std): 0.6889±0.0926

Weighted average by users:
GRU-P LogLoss (mean±std): 0.3521±0.1689
GRU-P RMSE(bins) (mean±std): 0.0633±0.0433
GRU-P AUC (mean±std): 0.6868±0.0946

user1823 · 2024-12-28T08:43:05Z

The result is not promising:

This suggests that we shouldn't focus on siblings (assuming that there wasn't any bug in the benchmark).

L-M-Sherlock added 2 commits December 25, 2024 17:11

FSRS-5 with siblings

37d00b8

fix wrong feature order

e22b559

not mix different cards when calculating delta_t

82a1f87

Create FSRS-5-dev.jsonl

69bd64f

support GRU-P-siblings

259974b

Create GRU-P-siblings.jsonl

0aa0100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expt/siblings #142

Expt/siblings #142

L-M-Sherlock commented Dec 25, 2024

Expertium commented Dec 25, 2024

user1823 commented Dec 26, 2024 •

edited

Loading

L-M-Sherlock commented Dec 26, 2024

user1823 commented Dec 26, 2024

L-M-Sherlock commented Dec 26, 2024 •

edited

Loading

user1823 commented Dec 26, 2024 •

edited

Loading

user1823 commented Dec 26, 2024

Expertium commented Dec 26, 2024

L-M-Sherlock commented Dec 27, 2024 •

edited

Loading

user1823 commented Dec 28, 2024

Expt/siblings #142

Are you sure you want to change the base?

Expt/siblings #142

Conversation

L-M-Sherlock commented Dec 25, 2024

Expertium commented Dec 25, 2024

user1823 commented Dec 26, 2024 • edited Loading

L-M-Sherlock commented Dec 26, 2024

user1823 commented Dec 26, 2024

L-M-Sherlock commented Dec 26, 2024 • edited Loading

user1823 commented Dec 26, 2024 • edited Loading

user1823 commented Dec 26, 2024

Expertium commented Dec 26, 2024

L-M-Sherlock commented Dec 27, 2024 • edited Loading

user1823 commented Dec 28, 2024

user1823 commented Dec 26, 2024 •

edited

Loading

L-M-Sherlock commented Dec 26, 2024 •

edited

Loading

user1823 commented Dec 26, 2024 •

edited

Loading

L-M-Sherlock commented Dec 27, 2024 •

edited

Loading