This repository has been created as part of the course work for the Programming for Data Analysis module in the Higher Diploma in Computer Programming in Data Analytics provided by Atlantic Technical University.
This repository has been created and edit by way of using Jupyter Notebooks.
This project analyses data from an Airbnb dataset for Dublin, Ireland and analysis statistical analysis on certain variables in addition to similar analysis on a synthesised dataset. To get started, the notebook is found in the following link and contains all the analysis for this project; https://github.com/AndrewShanahan/PFDA_Assignment/blob/main/PfDA_Assignment.ipynb
From the Airbnb dataset, I have identified 5 variables that I have decided to focus on and analyse:
- host_id (integer)
- host_listings_count (float)
- reviews_per_month (float)
- review_scores_rating (float)
- price (object) - Please note, during the project I had some instances where an object was not the most useful data type, in the spirit of good time keeping I amended this to a float rather than performing a task to remove any symbols (i.e. €,$,£ etc.).
Please note that as the project has evolved I have decided to utilise some other variables, for example; plotting graphs and charts and data synthesis.
The references section below may be of use to users in order to help with understanding of project.
To run or modify the notebooks on a local machine requires the latest version of Python, Anaconda is an easy to use version available on Windows, Mac or Linux operating systems. Alternatively, there are a number of web based version available like Jupyter Notebooks which has been utilised during this project.
Information on how to install and run Jupyter Notebooks can be found through the following link:
https://docs.jupyter.org/en/latest/install.html
The following link provides information on how to launch Jupyter Notebook from a terminal.
https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html
Dataset:
[01] AirBNB - http://insideairbnb.com/get-the-data/
Datacamp - numerous courses/tracks completed over last number of months have supported this exercise
Python/General:
[02] Udemy course - https://www.udemy.com/course/the-modern-python3-bootcamp/learn/lecture/8680110?start=94#overview
[03] Software Freedom Conservancy. Git - https://git-scm.com/.
[04] Datacamp - https://www.datacamp.com/
[05] W3Schools - https://www.w3schools.com/python/default.asp
[06] Stackoverflow - https://stackoverflow.com/
[07] Numpy/Random generator - https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html#numpy.random.Generator.normal
[08] Plot labels, titles, legend - lecturer video - https://web.microsoftstream.com/video/10974869-e53e-4621-961e-6a6922203374
Data Synthesis:
[09] Lecturer video - https://web.microsoftstream.com/video/84fb76a5-0c81-4ac9-8548-d8a6ed609366
[10] https://www.simplilearn.com/top-python-libraries-for-data-science-article#7_scikitlearn
[11] https://scikit-learn.org/stable/
[12] https://scikit-learn.org/stable/supervised_learning.html#supervised-learning
[13] https://www.freecodecamp.org/news/python-functions-define-and-call-a-function/#:~:text=Basic%20Syntax%20for%20Defining%20a,function%20to%20do%20for%20you.
[14] https://www.projectpro.io/recipes/create-simulated-data-for-classification-in-python
[15] https://www.geeksforgeeks.org/how-to-create-simulated-data-for-classification-in-python/
[16] https://towardsdatascience.com/https-medium-com-faizanahemad-generating-synthetic-classification-data-using-scikit-1590c1632922
[17] https://stackabuse.com/generating-synthetic-data-with-numpy-and-scikit-learn/
[18] Troubleshooting - https://stackoverflow.com/questions/45554008/error-in-python-script-expected-2d-array-got-1d-array-instead
[19] https://stackoverflow.com/questions/22071987/generate-random-array-of-floats-between-a-range
[20] https://www.w3schools.com/python/ref_random_uniform.asp
[21] https://en.wikipedia.org/wiki/NumPy
Distribution:
[22] https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.lognormal.html#numpy.random.Generator.lognormal
Jupyter Notebooks:
[23] https://stackoverflow.com/questions/48655801/tables-in-markdown-in-jupyter
Readme file editing:
[24] https://medium.com/analytics-vidhya/the-jupyter-notebook-formatting-guide-873ab39f765e
[25] https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes
ATU:
[26] Software Freedom Conservancy. Git. https://git-scm.com/
[27] https://www.atu.ie/sites/default/files/2022-08/Student%20Code_Final_August_2022.pdf
Inspiration:
[28] Karsten Jeschkies: https://github.com/jeschkies/gensim