Project Name

This project is a starting Pack for MLOps projects based on the subject "movie_recommandation". It's not perfect so feel free to make some modifications on it.

Project Organization

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources -> the external data you want to make a prediction on
│   ├── preprocessed      <- The final, canonical data sets for modeling.
|   |  ├── image_train <- Where you put the images of the train set
|   |  ├── image_test <- Where you put the images of the predict set
|   |  ├── X_train_update.csv    <- The csv file with te columns designation, description, productid, imageid like in X_train_update.csv
|   |  ├── X_test_update.csv    <- The csv file with te columns designation, description, productid, imageid like in X_train_update.csv
│   └── raw            <- The original, immutable data dump.
|   |  ├── image_train <- Where you put the images of the train set
|   |  ├── image_test <- Where you put the images of the predict set
│
├── logs               <- Logs from training and predicting
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   ├── main.py        <- Scripts to train models 
│   ├── predict.py     <- Scripts to use trained models to make prediction on the files put in ../data/preprocessed
│   │
│   ├── data           <- Scripts to download or generate data
│   │   ├── check_structure.py    
│   │   ├── import_raw_data.py 
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models                
│   │   └── train_model.py
│   └── config         <- Describe the parameters used in train_model.py and predict_model.py

Once you have downloaded the github repo, open the anaconda powershell on the root of the project and follow those instructions :

conda create -n "Rakuten-project" <- It will create your conda environement

conda activate Rakuten-project <- It will activate your environment

conda install pip <- May be optionnal

pip install -r requirements.txt <- It will install the required packages

python src/data/import_raw_data.py <- It will import the tabular data on data/raw/

Upload the image data folder set directly on local from https://challengedata.ens.fr/participants/challenges/35/, you should save the folders image_train and image_test respecting the following structure

├── data
│   └── raw           
|   |  ├── image_train 
|   |  ├── image_test

python src/data/make_dataset.py data/raw data/preprocessed <- It will copy the raw dataset and paste it on data/preprocessed/

python src/main.py <- It will train the models on the dataset and save them in models. By default, the number of epochs = 1

python src/predict.py <- It will use the trained models to make a prediction (of the prdtypecode) on the desired data, by default, it will predict on the train. You can pass the path to data and images as arguments if you want to change it

Exemple : python src/predict_1.py --dataset_path "data/preprocessed/X_test_update.csv" --images_path "data/preprocessed/image_test"
                                    
                                     The predictions are saved in data/preprocessed as 'predictions.json'

You can download the trained models loaded here : https://drive.google.com/drive/folders/1fjWd-NKTE-RZxYOOElrkTdOw2fGftf5M?usp=drive_link and insert them in the models folder

Project based on the cookiecutter data science project template. #cookiecutterdatascience

python make_dataset.py "../../data/raw" "../../data/preprocessed"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
airflow		airflow
dags		dags
data		data
eventlyai		eventlyai
fastapi		fastapi
logs		logs
mlflow		mlflow
mlruns/0		mlruns/0
models		models
notebooks		notebooks
src		src
streamlit		streamlit
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
LSTM_accuracy.png		LSTM_accuracy.png
LSTM_loss.png		LSTM_loss.png
Project_ Classification of Rakuten e-commerce products.pdf		Project_ Classification of Rakuten e-commerce products.pdf
README.md		README.md
VGG16_accuracy.png		VGG16_accuracy.png
VGG16_loss.png		VGG16_loss.png
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Name

Project Organization

About

Releases

Packages

Languages

License

karimosman89/NLP-with-Transformers

Folders and files

Latest commit

History

Repository files navigation

Project Name

Project Organization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages