Releases: kingo233/FCT-Pytorch
FCT paper reproduce result
Experiment details
According to the FCT paper.They used filters = [16, 32, 64, 128, 384, 128, 64, 32, 16],attn heads = [2, 4, 8, 12, 16, 12, 8, 4, 2].
But I came into an error Embed_dim must be divisible by num_heads
. I modify attn heads to [2, 4, 8, 8, 16, 8, 8, 4, 2]
When I trained on the author's original pytorch code,I came into overfitting.So I use dropout=0.5 in Convolutional Attention.
What's more,I used 1e4
to scale the grad to solve the gradient vanishing which can lead to dice stay same.
But the gradient vanishing have not been solved completely.You can still find the some parts of block_1
from tensorboard.
How to use
tar -xvf dice_87.tar
you will get a dir called outputmodel.
The fct model is in output_model/$time/model/fct.pt
, you can find out that the size is smaller than original author's pretrained model .
Run pip install tensorboard
to install tensorboard.
cd output_model && tensorboard --logdir .
will let you visualize my training process in your browser.