- Docker Link
- For Windows, please follow these instructions https://docs.docker.com/docker-for-windows/install/
- On Windows Docker requires that you enable the WSL2 feature
- You will also need docker-compose. For Windows, it's already included when you install Docker. For Linux please see here
- Familiarity of Docker container.
- QGIS Link if you want a complete stack
- Git Link
- A Database Client, to explore the data inside the database (I am using dbeaver
- Sublime editor (or a similar), with the ability to auto-format JSON strings
The project has several components:
- Database: A Spatially enabled database (postgresql13/postgis), to store not only the raw-ish responses fetched from Airbnb, but also to store, spatial elements derived from these responses. (eg. Listings as Point Geometries)
- The Broker. The broker manage the work queues.
The queues contain ordered messages, that can be read by the workers, and they contain the instructions to initiate a task (i.e the name of the task (function) and kwargs or parametres that contain messages with information on what task to run along with necessary kwargs or arguments). - Engine:
The Engine contains the business logic, of this application, and it can work on two modes, both of them share the same codebase.- Worker Mode: When working in this mode, it connects with the Broker for any new work-orders. Each work-order contains a unit of work that could include requesting data from Airbnb or undertaking some maintenance work in the database
- Scheduler Mode:
When in scheduler mode, the code issues work-orders, and send them into a work queue at the broker at regular or predefined intervals.- Example: "Do a calendar collection for the 9000 listings every 4 hours"
For this project we are using a smart proxy system from Zyte. Airbnb has in place measures that limit large scale data scraping from their sites, by throttling the amount of requests it accepts. You will need a proxy service from Zyte or a similar provider.
The objective of the initial setup is to set up the main components described above. Preliminary requirements if not already completed are as follows:
-
Install Docker
- Please follow the instruction on www.docker.com.
- Please note, that his could require multiple restarts if you have not installed it before in your system.
Also note that for this project will be using docker-compose. On Windows it comes with docker for windows . This is not the case for unix machines You may need to update your BIOS to enable hardware assisted virtualisation and data execution protection. The specific setting within your BIOS will vary based on hardware vendor. See here for more details.
-
Install git.
(After you have installed all three pieces of software above, open your favorite Windows terminal - I recommend powershell)
- Using your terminal, change to your preferred working directory and issue the following command to download the code:
git clone -b preview https://github.com/urbanbigdatacentre/ubdc-airbnb
By default, the above command will clone the code repository with the code init to a subfolder called udbc-airbnb.
The repository contains a number of files and folder. The ones which are the most important are docker-compose.yml and docker-compose-local.yml.
These two files together they describe the necessary services of this project. The first one (docker-compose.yml) describes the worker and the scheduler, while the docker-compose-local.yml describe the 'database', and the 'broker' part. Feel free to remove the local variant, and adjust the parameters if want to use an externally managed database and broker system.
Most of the working settings for this project can be set at the docker-compose.yml
NB. You must have docker installed.
open a Console and cd
into the code directory. Then execute the following:
`docker-compose -f docker-compose.yml -f docker-compose-local.yml up db rabbit worker`
this will launch a complete stack with all the services:
- a broker
- an EMPTY fresh database
- The data are stored inside the container, thus when you remove it, they will get deleted. The postgres that I am using here is based on the official postgres image. Please read the documentation here on how to make the data persistent
- one worker.
The default values for the database are:
Username: postgres
Password: airbnb
database: airbnb
port: 5432
note that if you want to change defaults parameters, like the root username, password or database name, you can modify the parameters inside the docker-compose.yml file, BEFORE the first time you create the database. Afterwards, you will need to change any of these, you'll need to do it thought the database. Please referer at postgresql manual for instructions.
Note, that the database needs to be up and running. HOW-TO
Inside the project folder; run the following command:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker migrate
The worker will connect at the database, and will run the migration script creating all the tables, relationships and indexes.
Note: All operations regarding spatial discovery by default are limited within the land layer mask. This effectively means that if there is no mask inside the system, you will not be able to define any areas!
The mask is a spatial layer that defines where in the world is land and where is water. It is needed, as the code base will only try to act upon land (after all there aren't any listings in the oceans - yet).
The land mask, is the GADM level 0 world border polygons LICENCE
.
Using a ubdc-airbnb miniconda console, cd
at the /src/dj_airbnb subfolder in the project dir and, you can issue the
following command to import the land boundaries for a single country identified by its ISO:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker load-mask --only-iso GBR
`docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker
The subroutine will download the GADM global border mask file (this step only has to be done once), and then import it the country specified above.
If you want, you can import ALL the countries by omitting the --only-iso
parameter, but the operation could take some
time to complete:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker load-mask
Requirements:
- Active Spatial Geodatabase Element
- Qgis
-
Open QGIS and start a new empty project
-
Through Data Source Manager
Ctrl-L
-
(Optionally but highly recommended):
Add OSM tiles:- Select XYZ source
- create a new source as a basemap: Type the following string as source:
http://tile.openstreetmap.org/{z}/{x}/{y}.png
-
Click on PostgreSQL Layers, and create a new connection using the following default parameters:
- default username:
postgres
- default password:
airbnb
- default dbname:
postgres
- default username:
-
Click Connect
-
Add the 4 spatial layers
-
-
Enable editing the app_aoishape layer.
- Draw a polygon.
That's where there system will look for listings. Choose a place that is reasonably big - and therefore likely to contains some listings - but not too big. - As attributes, the layer contains some flags including collect_calendars etc. Make sure all are enabled for this demonstration.
- Save your AOI.
Make a note of your polygon's ID. If it's the very first one drawn, the id should be 1
- Draw a polygon.
-
Using your miniconda console, with the ubdc-airbnb environment activated,
cd
to/src/dj_airbnb
subfolder found in the your project source directory. There:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker prep-grid <AOI-ID>
(replace with the actual AOI-ID that was given to the shape after it was saved in the database with qgis)
The above command will generate the initial grids, that will be used to form search queries to Airbnb.
Unfortunately Airbnb limits the number listings returned from a search query to a maximum of 300 (for quality of service reasons).
The code manages this restriction by requesting counts of listings in the generic grid. Where this count exceeds 50 ( empirical number) the code will subdivide the grid to 4 smaller child grids and repeat recursively until all grids have no more than 50 listings, as reported by Airbnb.
To initiate this grid scan:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker sense-aoi <AOI-ID>
(replace with the actual AOI-ID that was given to the shape after it was saved in the database with qgis)
Now that we have established our AOIs, we can start operating on them, usually harvesting data from known Airbnb listings, by creating tasks. The tasks are typically sent to a queue, managed by our broker. The queue is being monitored by workers, who take these tasks, act according to their instructions, and acknowledge their task's completion.
To send a generic task manually you can use the following command:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task <name_of_the_task>
NB For these commands to be successful, all the services with the exception of the scheduler must be running. You can refer in the instructions above on how to do it
For convenience, we've set up dedicated commands for common tasks.
Discover listings in a designated AOI:
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker find-listings <AOI-ID>
(replace with the actual AOI-ID that was given to the shape after it was saved in the database with qgis)
With the initialisation of this command, Points representing airbnb listing locations will start populate the database table. These points can be visualised from qgis.
Or to sent to discover/update all the listings in all the AOIs that have been marked in QGIS with the
flag scan_for_new_listings == True
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task discover-listings
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker fetch-listing-detail <LISTING-ID>
(replace with the actual LISTING-ID with an actual airbnb listing ID. You can load all the listings in an analysis environment with qgis.
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task get-listing-details
The above command will collect the listing details for the known listings within the AOIs that are marked with the
flag collect_details == True
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker fetch-calendar <LISTING-ID>
(replace with the actual LISTING-ID with an actual airbnb listing ID. You can load all the listings in an analysis environment with qgis.
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task get-calendars
The above command will collect the listing details for the known listings within the AOIs that are marked with the
flag collect_calendars == True
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker fetch-reviews <LISTING-ID>
(replace with the actual LISTING-ID with an actual airbnb listing ID. You can load all the listings in an analysis environment with qgis.
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task get-reviews
The above command will scan and collect the reviews and user details for the known listings within AOIs that are
marked with the flag collect_review == True
docker-compose -f docker-compose.yml -f docker-compose-local.yml run --rm worker send-task get-booking-quotes
The above command will scan and collect the booking quotes for details for the known listings within AOIs that are
marked with the flag collect_booking_quotes == True
warning this task will firstly ask for an up-to-date calendar, and then proceed to request a booking quote for the first available window based on the listing properties. It's using two Airbnb requests
Up to this point, we only had a single worker. Since we are using docker containers to deploy our services, we can easily replicate (duplicate) a service as many times as our system is happy to accommodate.
to have two workers in total issue the following command:
docker-compose scale worker=2
It is possible to set the system on scheduled mode. In that mode passively the system fires pretold tasks on predefined times.
The schedule can be found inside celery.py and a list of all the avaiable operations can be found on the operations
To enable the scheduler, open a console, and navigate at the project source folder. There run the command:
docker-compose up beat