Skip to content

The mechanism of optimization

Jarrett Ye edited this page Jul 19, 2023 · 5 revisions

FSRS is based on the DSR (Difficulty, Stability, Retrievability) memory model, training with Back Propagation Through Time and Maximum Likelihood Estimation.

The DSR model uses thirteen parameters and six equations to describe memory dynamics during spaced repetition practice (For details, see Free Spaced Repetition Scheduler).

Here is a brief introduction to the training process of FSRS.

Pre-processing Anki's review logs

The database schema of Anki's review logs is as follows (Copy from Database Structure):

-- revlog is a review history; it has a row for every review you've ever done!
CREATE TABLE revlog (
    id              integer primary key,
       -- epoch-milliseconds timestamp of when you did the review
    cid             integer not null,
       -- cards.id
    usn             integer not null,
        -- update sequence number: for finding diffs when syncing. 
        --   See the description in the cards table for more info
    ease            integer not null,
       -- which button you pushed to score your recall. 
       -- review:  1(wrong), 2(hard), 3(ok), 4(easy)
       -- learn/relearn:   1(wrong), 2(ok), 3(easy)
    ivl             integer not null,
       -- interval (i.e. as in the card table)
    lastIvl         integer not null,
       -- last interval (i.e. the last value of ivl. Note that this value is not necessarily equal to the actual interval between this review and the preceding review)
    factor          integer not null,
      -- factor
    time            integer not null,
       -- how many milliseconds your review took, up to 60000 (60s)
    type            integer not null
       --  0=learn, 1=review, 2=relearn, 3=cram
);

A revlog records card cid reviewed at time id with rating ease. It is easy to trace the entire review history of a card. The memory state of a card depends on its review history. Group the revlogs by card, sort them by time and concatenate the ratings and intervals between each adjacent review; it can generate the history of ratings and history of intervals for each card. Here are the samples after pre-processing:

image

Training with time-series data

In machine learning terms, the histories of ratings and intervals are time-series features. I want to use this time-series data to train the DSR model. The inputs are the r_history and t_history, and the outputs are Stability and Difficulty. I built a time-series model via PyTorch to handle this data type, which could automatically apply Backpropagation through time (BPTT) to train this model.

image

Training from end to end

There is another problem with training the model from the time-series data. One main goal of FSRS is to predict the memory state after a sequence of reviews. However, time-series data only records ratings instead of Stability or Difficulty. Stability is a statistical variable of a group of reviews with the same review history. Is it possible to estimate the stability of memory with the raw data? Maximum likelihood estimation (MLE) fits in this case.

According to the formula of the forgetting curve, the possibility of recall ( $r=1$ ) after $t$ days is $P(r=1,t|S) = e^{-\frac{t}{S}}$, and the probability of forgetting ( $r=0$ ) is $P(r=0,t|S) = 1 - e^{-\frac{t}{S}}$. Suppose that with the same stability $S$, done $n$ reviews, recall $p$ times, and fails $n - p$ times. How to estimate the stability $S$?

According to the MLE method, $P(r=1,t|S)^p \cdot P(r=0,t|S)^{n-p}$ is the likelihood function to $S$. Taking the logarithm of the likelihood function, we can obtain the log-likelihood formula:

$$p \cdot \log P(r=1,t|S) + (n-p) \cdot \log P(r=0,t|S)$$

Set $n = 1$ and obtain:

$$loss = - [p \cdot \log P(r=1,t|S) + (1-p) \cdot \log P(r=0,t|S)],$$

which is the loss function for updating $S$.

At this point, we then replace $S$ with the DSR model and consider the case of multiple reviews.

$$P(r=1,t,\vec{r}{i-1},\vec{t}{i-1}|\vec\theta) = e^{-\frac{t}{DSR(\vec{r}{i-1},\vec{t}{i-1})}},$$

where $\vec\theta$ is the model parameters.

With MLE and BPTT, we can then train the parameters of DSR directly from the time-series data.

Calculating the optimal retention

To get optimal retention, it needs to define "optimal". In my paper, the scheduling optimization is formulated to improve memory stability to the target level with minimum cost. For example, I need five reviews to stabilize a card to $S = 365$(days) with algorithm A, but only three reviews to achieve it with algorithm B. Therefore, algorithm B is more optimal than algorithm A.

In a simplified case, assume that algorithms are only based on retention. It is possible to enumerate each retention and calculate the expected cost, then pick the retention corresponding to the minimum cost. For details, please see the following pseudo-code:

image

The matrix $J$ records the minimum cost in all stability and difficulty values. The matrix $\pi$ records the minimum cost in all retention and difficulty values. The array $R^*$ records the optimal retention in each difficulty level. Finally, it can output the optimal retention by applying the weighted average to the array $R^*$.