TL;DR: We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness.
Why named cog attention?
- The attention pattern looks like cogs.
- The transformation cog ("T-cog") and the living metal of each transformer's body allows them to change from their natural robotic body into an "alternate mode" based on something a form of technology or life form that they've observed and scanned. —— Wikipedia. In summary, the cog enhances the expressiveness of Transformers :) net_path=YOUR_PATH_SAVE_MY_CKPT/cog_nnet_ema.pth &> cfg_mscoco_sample_cog.log &