Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAINT on Ednet data #4

Open
clara2911 opened this issue Jun 3, 2021 · 12 comments
Open

SAINT on Ednet data #4

clara2911 opened this issue Jun 3, 2021 · 12 comments

Comments

@clara2911
Copy link

Would you be able to add your code for running your implementation of SAINT on EdNet as well - besides the example on random data?

@kwonmha
Copy link

kwonmha commented Feb 16, 2022

It would be great for me to have entire codes for reproducing the results on the paper either.
Because I failed to get the performance with my implementation.
Mine was about 55 compared to 78 from the paper.

@Nino-SEGALA
Copy link

Hi kwonmha,
I also try to reproduce the paper :)

This implementation of SAINT is not completely finished.
For example, the dropout is not added, or the position_embeddings are wrongly added in every layer (instead of just in the first encoder/decoder), the LayerNorm should be placed like in the AttentionIsAllYouNeed paper (after the multi-head and after the FFN).
You can also change the position encoding to the one from AttentionIsAllYouNeed.

All the rest is correct :) I reach AUC=0.76 with it but I'm not able to get the last 2% and also my metrics crash if I use a dimenson_model of 512 like in the paper (it works only with a smaller model)

@kwonmha
Copy link

kwonmha commented Apr 14, 2022

Hi, @Nino-SEGALA
Thanks for the informing.

In my case, I think the problem exists in data processing or data itself, not in modeling.
Because my model works fine with Ednet data from Kaggle.

Do you have any plan to upload your code on your github?

@Nino-SEGALA
Copy link

I will try to upload it here with a Pull Request :)

I don't understand, it works with EdNet from Kaggle, but not with EdNet from the paper?
What is the difference between them?
Can you link both datasets? :)

@kwonmha
Copy link

kwonmha commented Apr 18, 2022

@Nino-SEGALA Here's the link to the dataset I mentioned.
https://github.com/riiid/ednet
It's KT-1 and you also need to download content data.

@Nino-SEGALA
Copy link

Yes, I also use this one (and get 0.76 AUC with dim_model=128, if I use a larger model dim_model=512 I get AUC=0.5 me too :/)

Maybe you can try with a smaller model

And this dataset 'my model works fine with Ednet data from Kaggle' ? :)

@kwonmha
Copy link

kwonmha commented Apr 19, 2022

Thanks for informing!

@Nino-SEGALA
Copy link

@kwonmha
Here's the correct code of SAINT
#6

Let me know if you solved the training of SAINT with a large dimension of the model (d_model=512) :D

@kwonmha
Copy link

kwonmha commented May 25, 2022

@Nino-SEGALA Have you tried applying Noam scheme learning rate scheduling mentioned on the paper?
It's in Training Details section.

I got the same problem where auc stays around 0.5 with dimension 256, 512.
And validation auc is going above 0.7 with Noam scheme.
It looks neccessary for training large transformer model.

Noam scheduler code link
I used Lina Achaji's code.
And I added

    def zero_grad(self):
        self.optimizer.zero_grad()

In the class for convenience.

As it changes leraning rate regard to step, batch_size looks important which have effect on the number of steps in training.

I got 0.7746 AUC with dim 256, 7727 with dim 512

@Nino-SEGALA
Copy link

Nino-SEGALA commented Jun 4, 2022

Thanks a lot for your comment kwonmha!

I did my training without Noam Scheme, and since I have implemented it, I didn't retry to do the big trainings.
It is indeed what's making the difference!
I didn't reach metrics as high as you, but my model didn't train until convergence (it stopped a bit before). I'll let you know when I have my final results :D

@kwonmha could you also share your ACC, RMSE and BCE loss if you have them?

@kwonmha
Copy link

kwonmha commented Jun 8, 2022

Sorry but I haven't measure metrics other than AUC so far.

@Nino-SEGALA
Copy link

I got 0.7666 AUC with dim 256, 0.7537 with dim 512
And my dim 512 training crashed after (AUC=0.6), even if it uses Noam Scheme now :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants