The databases used in this homework can be found here.
It contains several informations about insertions of houses and apartments on Airbnb platform, all the building are located in Texas - US.
As a result, we have a Jupyter Notebook that contains all the work done, explained step by step and few links showing the maps requested by the bonus point.
The purpose of the homework was to build a search engine able to retrieve information from 18 thousands documents regarding airbnb insertions in the state of Texas - US, considering as input a text query given by the user.
The search engine at first return all the documents related to the apartments that contain all the words in the query provided; in further steps we had to return better results, taking into account the tf-idf scores for the documents and combining them with the cosine similarity as scoring function.
Our creative approach then came in, as the final request was to think about a new scoring function for our search engine. We decided (as with the informations we could retrieve from docs) to take into account the distance by the city center of the house/apartment and the ratio between average price per night and number of bedrooms. Normalizing those numbers we were finally able to achieve a different way to score our results.
We present also in this repository, the achievement of the "Bonus" point of the HW. We had to show, for a given place (namely or with coordinates) all the documents (buildings) in the db which are within a range (kmeters) from an adress. The markers upon the location in the map allow us to open the link of the related apartment.
The links below show the maps for Dallas and San Antonio, which cannot be seen from the jupyter notebook uploaded.
https://nbviewer.jupyter.org/github/Edoardoba/Homework03/blob/master/dallas.html
https://nbviewer.jupyter.org/github/Edoardoba/Homework03/blob/master/san_antonio.html