Scent Angels

We have created a website that uses machine learning to predict if a perfume is for men, women or unisex based on perfume notes. Try our “Perfume Designer” App where you can create your perfume and see which gender it would be ideal for. A brief introduction to our website development is here: https://docs.google.com/presentation/d/1kLpmB_BrK-Ui5TlWWvtS06L78TBK4_cAGl9DQRv548c/edit#slide=id.gbf44f01a53_0_25
Check out our LIVE Heroku Webpage here: https://scent-angels.herokuapp.com/

Find Your Favorite Scent!

General info

Team Members:

Data and Information Sources

Web Scraping Data

Fragrantica

Perfume Terminology

Technologies

Python
Flask
JavaScript
HTML/ CSS
Bootstrap
JSON
MongoDB
Heroku

Libraries and Dependencies

Python Dependencies

BeautifulSoup
Flask
Flask_pymongo
Json
Matplotlip
Numpy
Pandas
Pprint
Pymongo
Re
Requests
Splinter
Selenium
Seaborn
Sklearn
Tensorflow
Time
Webdriver_manager.chrome

JavaScript Dependencies

D3 Javascript
D3 ToolTip

HTML and CSS Dependencies

Data Selection, Processing & Cleanup

Selecting the Data Source

The fragrantica.com website provided information on perfumes/cologne available for sale, including the following:

Fragrance name and designer
Fragrance image
Main accords
Top notes, middle notes, and base notes
Perfume rating out of 5
Sillage and Longevity
Price Value
Gender

Web Scraping

We initially created web-scraping code to gather all perfumes from the fragrantica.com website. However, on the main search page when the "see more results" button was clicked, the page maxed out at 1,000 perfumes, despite having over 60,000 perfumes on record. We rewrote our code to scrape the data by year in a effort to gather as many perfumes as possible, presuming that no more than 1000 perfumes would have been developed in any one year (from the 1920s through 2020). If more than 1,000 perfumes were produced in a single year, the then those extra perfumes would not be reflected in our database. Additionally, due to other web scraping limitations, we were able to scrape a total of 517 perfumes during the project timeframe.

We also scraped all available perfume notes, for a total of 1,012.

Processing and Cleanup

Performed web scraping in jupyter notebook using BeautifulSoup, Splinter, and selenium
Scraped perfume_data was converted data to a dictionary or list of dictionaries and exported to a json file.
Scraped perfume_notes were converted to a csv file
Combined json files from multiple web scrapes in a jupyter notebook, and exported combined json file to MongoDB to create the perfume_notes collection.
Imported the csv into jupyter notebook; cleaned columns to optimize export to MongoDB; and exported csv

Machine Learning Development

Based on the perfume notes, we wanted to check if we could predict if the perfume is for men, women or unisex.

We loaded the perfume data from MongoDB. The perfume notes are divided as top, middle and base notes. Each of these fields is a list. So we had to use MultiLabelBinarizer to create a feature column for each note based on whether it was present as a top, middle or base note.

If a note was not an ingredient in the perfume, it was marked as 0, otherwise it was marked as 1. We decided to use only notes as features. Accords are a combination of notes, so we dropped that as a feature. Longevity, Sillage, gender vote and price value does not affect the outcome of whether the perfume is for a particular gender, so we skipped those as features as well. After dropping unnecessary columns from the perfume dataframe, the resulting was X(data), which are all the features, and y(target), which was if the perfume was for men, women or unisex.

A dataframe was created listing all the features and uploaded to MongoDB, which is used later app.py. Another dataframe with all the notes from the perfumes in the perfume_data collection was created and added to MongoDB as notes_features. This is used for the list of notes on the "Create A Frangrance" page.

We also looked at the feature importance, but did not remove any since they had almost the same importance.

After splitting the data into train and test we tried the following Models and checked their classification reports to find out which was the best model:

SVC

We used GridSearch with the following parameters:

              'kernel': ('linear', 'rbf')}

KNearestNeighbors

DecisionTree

RandomForest

Deep Learning Neural Network

From the classification reports, we decided that SVC with kernel:rbf and C=20 was the best model with an accuracy of 0.62.

We saved this model using joblib.

final_model = grid.best_estimator_
filename = '../webapp/static/Resources/gender_perfume_model.sav'
joblib.dump(final_model, filename)

Tableau Development

For the Tableau visualizations we used a jupyter notebook to change our json to a csv. We used a calculated field to change Rating from a string to an integer. We also truncated Rating to create a bar chart of the ratings in the second Dashboard. The dashbaords were then embedded into our website.

Perfume Designer App

We created a web-app using Flask, HTML, CSS, JavaScript, and D3. Our app includes a Home page and 3 interactive webpages:

Create Your Fragrance: lets you create a perfume based on a selection of Top, Middle, and Base Notes. You can select up to three notes of each: scroll through and peruse the list, or simply start typing a note that strikes your fancy. Our machine learning model will let you know if that scent is best for men or women or both. Our model will also tell you whether you've created an entirely new perfume!
Finally, if an existing scent contains at least one each of the top, middle, and base notes you selected, our model will return the names of those scents for you to consider in your search for the perfect perfume!
Perfume Info: Click on a fragrance term to see its definition and gain a better understanding of what makes a perfume!

Setup

Setup Instructions

Create a new conda environment with python version 3.7
Use pip install -r requirements.txt
Webscraping: The followings files were involved in web-scraping or used to create json files that were imported into the Mongo DB and DO NOT need to be run: Notes_Scrape.ipynb and Perfume_Scrape.ipynb
Run https://github.com/sir-omoreno/final-project/blob/main/mongo_db/mongo_db_creation.ipynb to create the mongo database perfume_db, and the perfume_data and perfume_notes collections
Run https://github.com/sir-omoreno/final-project/blob/main/ML/Perfume_ML_Gender.ipynb to create the perfume_features and note_features collections, and the machine learning model (gender_perfume_model.sav)
Run https://github.com/sir-omoreno/final-project/blob/main/webapp/app.py in your new conda environment to open the flask web application.

Lessons Learned

We had initially planned to create a machine learning model that would predict the popularity of the perfume based on the perfume features. However, this model returns negative R2 scores indicating that the model was a terrible fit for the data. Had we been able to scrape more perfumes and tune our model, we may have had better results.

Inspiration

Inspired by Rutgers Data Visulization Bootcamp & smelly people everywhere!

Contact

Created by:

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.vscode		.vscode
ML		ML
OLD		OLD
Resources		Resources
Web Scraping Files		Web Scraping Files
images		images
mongo_db		mongo_db
webapp		webapp
.DS_Store		.DS_Store
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
all_notes.csv		all_notes.csv
perfume_data_combined.csv		perfume_data_combined.csv
perfume_data_combined.json		perfume_data_combined.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scent Angels

Find Your Favorite Scent!

Table of contents

General info

Team Members:

Data and Information Sources

Web Scraping Data

Perfume Terminology

Technologies

Libraries and Dependencies

Data Selection, Processing & Cleanup

Selecting the Data Source

Web Scraping

Processing and Cleanup

Machine Learning Development

SVC

KNearestNeighbors

DecisionTree

RandomForest

Deep Learning Neural Network

Tableau Development

Perfume Designer App

Setup

Setup Instructions

Lessons Learned

Inspiration

Contact

About

Releases

Packages

Contributors 5

Languages

sir-omoreno/perfume_designer_app

Folders and files

Latest commit

History

Repository files navigation

Scent Angels

Find Your Favorite Scent!

Table of contents

General info

Team Members:

Data and Information Sources

Web Scraping Data

Perfume Terminology

Technologies

Libraries and Dependencies

Data Selection, Processing & Cleanup

Selecting the Data Source

Web Scraping

Processing and Cleanup

Machine Learning Development

SVC

KNearestNeighbors

DecisionTree

RandomForest

Deep Learning Neural Network

Tableau Development

Perfume Designer App

Setup

Setup Instructions

Lessons Learned

Inspiration

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages