Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sergey cheremshinsky #9

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Sergey cheremshinsky #9

wants to merge 31 commits into from

Conversation

sergTch
Copy link
Collaborator

@sergTch sergTch commented Jan 31, 2021

Description

Start - Preprocessing data "spliting columns. Example:
Embark that has values S, Q, C splited into
Embark_S, Embark_Q, Embark_C with values 0 or 1

Train - made function that train 5 models from sklearn and allow them to vote
on validation test showes ~82.5% accuracy
also made neural network using tensorflow with ~86% accuracy

However both tries showes only 76-80% accuracy on kaggle

How has this been tested?

  • Executed code few times from start to end without crashing
  • Only ran once
  • Code have dedicated unit tests

Copy link
Collaborator

@meanalexrin meanalexrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Looks Good To Me
Except:
Please move you work to the /[username]_code location, as described in readme
https://github.com/rnd4u-org/2021-knu-cairl#development

I'll approve after that
Also that's a good approach that you are using separate .py files rather than a single jupyter notebook

main.py Outdated
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
x_train = x
y_train = y
# exit()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it's better to remove unused code from the project. It helps readability

main.py Outdated
print('Exported!')


generateAns()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may consider using if __name__ == "__main__" in the python code to indicate entry points to the program

https://stackoverflow.com/a/419185

@sergTch sergTch requested a review from ARKAD97 as a code owner February 3, 2021 21:18
@sergTch
Copy link
Collaborator Author

sergTch commented Feb 3, 2021

I fixed placements of files. Added solution for mask classification.
Didn't add to titanic solution if name == "main": but used it in masks
In new project also deleted all code comments however regretted few times. Need to get used to it:)

@sergTch
Copy link
Collaborator Author

sergTch commented Feb 3, 2021

On masks dataset used inception v3 with 1024 512 and 128 dense layers. Trained in 3 steps
freezing 0 : 210 layers, then 0 : 105 and 210 : 316 and last freezed 105 : 316
that helped to train all parts of cnn and achieve 100% on training data and 99.4% on test
At start splited data on test and train where test is 20% of dataset and train is other 80%

Copy link
Collaborator

@meanalexrin meanalexrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check other comments

Regarding task 3:
You can try rerun model few times without saving .npy indexes. So you'll test model on few different train/test splits

y_test = np.load('y_test.npy', allow_pickle=True)

return (x_train, x_test, y_train, y_test)
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use less general errors or if statements

epochs=10,
validation_data=(x_test, y_test),
validation_steps=1511 // batch,
steps_per_epoch=6042 // batch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add constants and description for magical numbers like 1511 and 6042

x_test = [images[i] for i in indexes[n:]]

y_train = [labels[i] for i in indexes[:n]]
y_test = [labels[i] for i in indexes[n:]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use existing train test split function to assure correct execution

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

@sergTch
Copy link
Collaborator Author

sergTch commented Feb 8, 2021

I used saving of model and saving of splited data mostly because that's hard for my laptop to finish training in some reasonable amount of time. So I splited training for three nights saving model after each and using same split of data

@sergTch
Copy link
Collaborator Author

sergTch commented Feb 9, 2021

I fixed mistakes that you mentioned about task 3 and also pushed solution for task 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants