Releases · kingo233/FCT-Pytorch

Experiment details

According to the FCT paper.They used filters = [16, 32, 64, 128, 384, 128, 64, 32, 16],attn heads = [2, 4, 8, 12, 16, 12, 8, 4, 2].
But I came into an error Embed_dim must be divisible by num_heads . I modify attn heads to [2, 4, 8, 8, 16, 8, 8, 4, 2]

When I trained on the author's original pytorch code,I came into overfitting.So I use dropout=0.5 in Convolutional Attention.

What's more,I used 1e4 to scale the grad to solve the gradient vanishing which can lead to dice stay same.

But the gradient vanishing have not been solved completely.You can still find the some parts of block_1 from tensorboard.

How to use

tar -xvf dice_87.tar you will get a dir called outputmodel.

The fct model is in output_model/$time/model/fct.pt, you can find out that the size is smaller than original author's pretrained model .

Run pip install tensorboard to install tensorboard.

cd output_model && tensorboard --logdir . will let you visualize my training process in your browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment details

How to use

Releases: kingo233/FCT-Pytorch

FCT paper reproduce result

Experiment details

How to use