Git commands for creating github repository from local directory command line, pushing folder to created repo, deleting selective folders and commit changes
git init -b x0pa-ai
gh repo create x0pa-ai
git pull --set-upstream origin x0pa-ai
git rm -r --cached folder
git commit -m "Removed folder from repository"
git push origin main
x0pa_ds_interview_round_2_train.xlsx - Training Dataset used to train model
x0pa_ds_interview_round_2_test.xlsx - Test Dataset without Job category column used to make model prediction and determine performance
x0pa_ds_interview_round_2_test_predictions.xlsx - Test dataset including model prediction column
sgd_classifier.pkl - Trained SGD classifier parameters for input data inferencing pipeline
pipe.pkl - Sklearn pipeline used to preprocess input job description on train dataset, can be used for inference pipeline
Inside the nlp_module folder, some importable modules I created worth highlighting.
Dataprocessor.py - Class and its methods for importing all formats of files, transformation into pivot table, normalisation
Modelevaluator.py - Contains class and its methods for model training, cross validation, evaluation and persistence into pickle file
I have also added init.py file mentioning this scripts inside this directory so that these scripts are considered as importable modules.
from nlp_module.Dataprocessor import *
from nlp_module.Modelevaluator import *
A Streamlit API to return predictions from a trained ML model.
Development set-up instructions First, open a command line interface and clone the GitHub repo in your workspace
git clone https://github.com/Anirban6393/x0pa-ai.git
cd x0pa-ai
Once dependencies are installed, set up the requirements.txt to download required packages.
pip install -r requirements.txt
Now, run app.py python script that interfaces with end users uploading some excel file and dumping into sqlite3 database. Upload x0pa_ds_interview_round_2_test.xlsx.
streamlit run app.py
Open the URL http://localhost:8501/ with your browser to view the list of job titles predicted by model for given job descriptions.
You can also send a SQL query to manipulate and view data from sqlite3 stored data. For instance, below query will give you number of job postings for each job category.
SELECT Type, COUNT(ID) as Job_Counts FROM Jobs GROUP BY Type ORDER BY Job_Counts DESC;
Build docker image to begin with. It utilises dockerfile in the directory.
docker build -f Dockerfile -t app/x0pa .
Next, create and run a docker command in detached mode.
docker run -p 8501:8501 -d app/x0pa
View all containers running.
docker ps -a
Stop and kill a container.
docker container stop <container_id>/<container_name>
docker container kill <container_id>/<container_name>
Restart a stopped container.
docker container restart <container_id>/<container_name>
Delve into a docker container. Install vim in order to download linux editor for viewing and editing files inside container directory.
docker exec -it <container_id>/<container_name> bash
apt-get install vim
ls -lrt
vi <filename>
For every container, this line is commented. The test excel file once ingested into the database that's why. Uncomment it if you want to ingest additional files.
vi app.py
i
#df.to_sql('Jobs', con=engine, if_exists='append',index=False)
wq!