-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovery module #6
Comments
Hi @phixerino ! Yes, you understand correctly. The discriminator might have been cool to also implement but in the end it was more straightforward to just use the estimator head when doing end-to-end training. The descriptor that is used by the recovery module is the one returned by the SEnSeI model. Specifically, it depends on the self.descriptor_vectorizer attribute of the SEnSeI model. See here: Lines 61 to 66 in eba151d
....which is used to process the descriptors in the forward call: Lines 73 to 77 in eba151d
Let me know if that's clear or if you need any other info. As for the possible mistake, I think you're right! It should clearly be postconcatenation_layer_sizes there. Thanks for spotting it! |
Thank you for your response. Right, I thought that the recovery module has SEnSeIv2 output and original descriptors as inputs, which would make sense (and its presented like this in the paper). And the original descriptors in this case would be outputs from SEnSeIv2Descriptors. But in the code the descriptors that enter the recovery module are the ones returned from SEnSeIv2, which would mean they are not the original descriptors, because they are transformed by GlobalStats, FCLBlock and AttentionBlock. Am I reading the code incorrectly or is this expected behavior? And regarding my question "And in the code I see that there are preconcatenation layers just for the descriptor, is there a specific reason for that?", now that I think about it, its to transform the descriptor to 32-dim vector so it can be concatenated with the embedding, right? |
Sorry for the barrage of questions, but I have come across a few more things:
|
Two more things:
Sorry for so much nitpicking, but I am changing the code a bit for deployment on my target HW, and I want to make sure that my understanding of both the paper and the code is correct. |
Oh dear, there seems to be some quite large bugs here that you've uncovered - but lots of possibilities to improve things! You're right about the descriptors being transformed by SEnSeI. I somewhat lazily refer to the vectors as "descriptors" throughout the model, and indeed the descriptors given to the recovery module are in fact the vectors outputted by SEnSeI which correspond to the original descriptors. You are completely right about the skip being skipped in the attention layer, that's just a straight-up typo and should be fixed. I worry though that doing so will break the trained weights that are already on HuggingFace. I will probably update their config files to not include skips, then change the code to allow for real skips. Maybe I will just retrain the models and upload a new version, too. Same goes for the batch_first=True point. I didn't even consider that the TransformerEncoderLayer class would expect (seq, batch, feature) in the dimension order... seems very unintuitive as the default (for someone like me who is not in NLP, at least, it seems weird). So, that is very interesting, that essentially the models as they are currently do not have information shared across bands... As for the sum vs. mean and the batch norm at the very end of the model: I know that mathematically sum and mean are not equivalent, but they achieve essentially the same thing in this case, especially when followed by a normalization step. To be honest the batch norm was probably a design choice based on trail-and-error during development, where it seemed to work better with it than without and so it stayed there. Overall, I think this merits a debugged version in a new branch and then a training run to see if it makes any/much difference. I will begin the process now and feel free to jump in and suggest any other changes on that branch. |
(Closed by mistake). Here is the branch that I will work on |
Great, hopefully it will lead to better results. So the input to the recovery module should be the descriptor from AttentionBlock? And should the input wavelength be normalized before spectral encoding? |
Some interesting findings from the debugged version on the new branch. I trained a sensor independent model that is equivalent to the one in the paper:
This experiment was run on my old cluster and we are now in the process of setting up our new one in a new org. So, when that's set up and I have time, I will retrain all the models and update the configs with the skip argument. I'll keep all the old model weights too for reproducibility of the paper. But it might take me some months to do this, depending on how things go with our computing server. The good news is, you're not missing out on any big performance improvements by using the current weights, but if you are training from scratch then you may well want to use the debugged model definition from the new branch to get faster training convergence. RE your question about the attention block inputs: the "descriptors" given to the recovery module are the original, spectrally encoded descriptors that are given to SEnSeI, not the vectors produced by SEnSeI which are transformations of those descriptors. The input wavelengths are normalised using the log formula in the paper, then encoded. |
Hi, I have couple questions regarding the recovery module:
Do I understand it correctly that the discriminator from SEnSeI-v1 is not used and instead only the estimator is used during the main supervised training?
The descriptor entering the recovery module is coming from which block?
And in the code I see that there are preconcatenation layers just for the descriptor, is there a specific reason for that?
Also there might be a mistake in the code:
SEnSeIv2/senseiv2/models.py
Lines 557 to 559 in eba151d
The text was updated successfully, but these errors were encountered: