Skip to content

Heibattttt/Michelin-restaurant-in-Italy-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Homework 3 - Algorithmic Methods of Data Mining

Michelin Restaurants in Italy

Michelin Restaurant

This repository contains the solution for Homework 3 of the course Algorithmic Methods of Data Mining.


The main goal of this homework is to explore and analyze Michelin restaurant data in Italy. The project includes multiple tasks such as data cleaning, feature extraction, text processing and visualization. To get all the data that we needed, we did a web scraping process of the Michelin restaurants site and we extracted the html of each restaurant on the page and saved it in a folder called HTML Michelin Restaurants.

Repository Contents:

  • Project/: A folder containing a notebook file with the progress and comments of the tasks performed and a fil.py with the code of an advanced search engine algorithm. Specifically it includes:
    • Michelin-restaurant-in-Italy-web-scraping.ipynb: A Jupyter Notebook containing Python code, explanations, outputs for each question of the homework and a pseudocode of an algorithm for moving a robot in a warehouse for packages collection.
      • Warning: To view all output from the file please download the file and read it in a supported development environment;
    • search_engine_filters.py: This file was used in Michelin-restaurant-in-Italy-web for Bonus part and is an interactive restaurant search engine built with Python and ipywidgets, which offers advanced filtering capabilities. It features an intuitive interface with dynamic drop-down menus and checkboxes.
      • Warning: To view the interactive interface, make sure to run the code cell in a environment with ipywidgets enabled. Here is an example of how it works: Search Engine

Dropbox files used for the project:

  • files/: A Dropbox folder containing all output files generated from the homework tasks. In detail:
    • all_restaurants_data.csv: Dataset containing all information about Michelin restaurants in Italy, collected from the Michelin website. This file is the output for Question 1.3;
    • vocabulary.csv: A CSV file that maps each word in the description column of all_restaurants_data.csv to a unique integer (term_id). This file is the result of Question 2.1.1;
    • inverted_index.pkl: A pickle file containing a dictionary mapping each term_id to a list of document IDs where that term appears. This file is the output for Question 2.1.1;
    • coordinates.csv: A CSV file containing all unique city coordinates in the dataset. This file is the output for Question 4;
    • top_k_restaurants_map.html: An HTML file displaying the top-k Michelin restaurants based on a custom scoring system, visualized on a map. This is the result for Question 4. To view the map you need to download the html file or view the Notebook file on a development environment such as Visual Studio, Jupyter or Pycharm

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •