>be us
>some hobbyist students in bioinformatics
>decide to predict transmembrane sequences because why not
>start with protein sequences, the usual suspects
>fire up our trusty Python script, let's get this bread
>SEPARATE the training and testing data like Moses parting the Red Sea
>some genius on Reddit says use deep learning
>download TensorFlow, nearly fry my laptop
>ERROR: CUDA memory exhausted
>realize my GPU is a potato
>fine, fallback to logistic regression, classic
>model accuracy: 98% in the first epoch
>turns out we were just overfitting to the noise
>spend the next three days tuning hyperparameters like some kind of mad scientists
>finally, model works
>predicts transmembrane regions with 70% accuracy
>publish results, get cited by a grand total of three people
>one of them is my mom
>boss says we need better results
>start manually curating the dataset
>dreaming of transmembrane helices
>wake up in a cold sweat, grab laptop, another all-nighter
>re-run the predictions
>accuracy drops to 50%
You can access files of our run that was described in the paper here