Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First release - Initial implementation #1

Merged
merged 28 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2edc247
First commit. Template created from cookiecutter-data-science
Feb 26, 2024
6a052d0
Defined a first flow in make_dataset.py (for downloading and exportin…
matteorosato Mar 6, 2024
ecdee32
Added clean_dataset function
matteorosato Mar 7, 2024
1488703
Added Datasource and Idealista classes to handle the flow in an OOP way
matteorosato Mar 8, 2024
647af41
Added information in README.md
matteorosato Mar 12, 2024
aa4ed5a
Added guides from Idealista on how to configure APIs
matteorosato Mar 12, 2024
def82a3
Added results pagination
matteorosato Mar 12, 2024
8f1c70c
filtered_params is not a property anymore
matteorosato Mar 12, 2024
f1b22aa
Changed the way duplicates are deleted
matteorosato Mar 13, 2024
c45a2e3
Major improvements in Idealista class
matteorosato Mar 20, 2024
653f3e1
Added train_model.py:
matteorosato Mar 20, 2024
466b201
separated build_features from clean_dataset
matteorosato Mar 22, 2024
6cf01e7
first version of predict_model.py
matteorosato Mar 22, 2024
a7e4128
added constants.py
matteorosato Mar 22, 2024
f63367f
added results folder to .gitignore
matteorosato Mar 22, 2024
9760f1e
max_pages is now an instance attribute
matteorosato Mar 22, 2024
9f19d39
minor changes in predict_model.py
matteorosato Mar 22, 2024
fefee77
added type hint to make_dataset.py methods
matteorosato Mar 22, 2024
8d7145c
added type hint to train_model.py methods
matteorosato Mar 22, 2024
dfc8123
added type hint to predict_model.py methods
matteorosato Mar 22, 2024
6330cd7
removed some models from train_model.py
matteorosato Mar 22, 2024
bd03382
minor changes in config.toml
matteorosato Mar 22, 2024
b093446
Changed references from "property-finder" "house-finder"
matteorosato Mar 25, 2024
32ff8dd
modified README.md
matteorosato Mar 26, 2024
88422e8
added run.py
matteorosato Mar 26, 2024
ea93377
printed extra info in create_train_test_df method
matteorosato Apr 3, 2024
a4702ec
minor changes in config.toml
matteorosato Apr 3, 2024
dd48f1f
modified date in CHANGELOG.md
matteorosato Apr 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#### IDEALISTA ####
IDEALISTA_API_KEY=myApiKey
IDEALISTA_SECRET=mySecret
106 changes: 26 additions & 80 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so
Expand All @@ -19,12 +18,9 @@ lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
Expand All @@ -39,87 +35,27 @@ pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
# DotEnv configuration
.env
.venv
env/
Expand All @@ -128,17 +64,34 @@ ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject
# Database
*.db
*.rdb

# Pycharm
.idea

# VS Code
.vscode/

# Rope project settings
.ropeproject
# Spyder
.spyproject/

# mkdocs documentation
/site
# Jupyter NB Checkpoints
.ipynb_checkpoints/

# mypy
# exclude specific folders from source control by default
/data/
/results/

# Mac OS-specific storage files
.DS_Store

# vim
*.swp
*.swo

# Mypy
.mypy_cache/
.dmypy.json
dmypy.json
Expand All @@ -151,10 +104,3 @@ dmypy.json

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Changelog

## 0.1.0 (2024-04-01)
#### New Features

* defined project structure
* added scripts for generating data
* added scripts for training and prediction

#### Docs
* added README
144 changes: 144 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
.PHONY: clean data lint requirements sync_data_to_s3 sync_data_from_s3

#################################################################################
# GLOBALS #
#################################################################################

PROJECT_DIR := $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
BUCKET = [OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')
PROFILE = default
PROJECT_NAME = house-finder
PYTHON_INTERPRETER = python3

ifeq (,$(shell which conda))
HAS_CONDA=False
else
HAS_CONDA=True
endif

#################################################################################
# COMMANDS #
#################################################################################

## Install Python Dependencies
requirements: test_environment
$(PYTHON_INTERPRETER) -m pip install -U pip setuptools wheel
$(PYTHON_INTERPRETER) -m pip install -r requirements.txt

## Make Dataset
data: requirements
$(PYTHON_INTERPRETER) src/data/make_dataset.py data/raw data/processed

## Delete all compiled Python files
clean:
find . -type f -name "*.py[co]" -delete
find . -type d -name "__pycache__" -delete

## Lint using flake8
lint:
flake8 src

## Upload Data to S3
sync_data_to_s3:
ifeq (default,$(PROFILE))
aws s3 sync data/ s3://$(BUCKET)/data/
else
aws s3 sync data/ s3://$(BUCKET)/data/ --profile $(PROFILE)
endif

## Download Data from S3
sync_data_from_s3:
ifeq (default,$(PROFILE))
aws s3 sync s3://$(BUCKET)/data/ data/
else
aws s3 sync s3://$(BUCKET)/data/ data/ --profile $(PROFILE)
endif

## Set up python interpreter environment
create_environment:
ifeq (True,$(HAS_CONDA))
@echo ">>> Detected conda, creating conda environment."
ifeq (3,$(findstring 3,$(PYTHON_INTERPRETER)))
conda create --name $(PROJECT_NAME) python=3
else
conda create --name $(PROJECT_NAME) python=2.7
endif
@echo ">>> New conda env created. Activate with:\nsource activate $(PROJECT_NAME)"
else
$(PYTHON_INTERPRETER) -m pip install -q virtualenv virtualenvwrapper
@echo ">>> Installing virtualenvwrapper if not already installed.\nMake sure the following lines are in shell startup file\n\
export WORKON_HOME=$$HOME/.virtualenvs\nexport PROJECT_HOME=$$HOME/Devel\nsource /usr/local/bin/virtualenvwrapper.sh\n"
@bash -c "source `which virtualenvwrapper.sh`;mkvirtualenv $(PROJECT_NAME) --python=$(PYTHON_INTERPRETER)"
@echo ">>> New virtualenv created. Activate with:\nworkon $(PROJECT_NAME)"
endif

## Test python environment is setup correctly
test_environment:
$(PYTHON_INTERPRETER) test_environment.py

#################################################################################
# PROJECT RULES #
#################################################################################



#################################################################################
# Self Documenting Commands #
#################################################################################

.DEFAULT_GOAL := help

# Inspired by <http://marmelab.com/blog/2016/02/29/auto-documented-makefile.html>
# sed script explained:
# /^##/:
# * save line in hold space
# * purge line
# * Loop:
# * append newline + line to hold space
# * go to next line
# * if line starts with doc comment, strip comment character off and loop
# * remove target prerequisites
# * append hold space (+ newline) to line
# * replace newline plus comments by `---`
# * print line
# Separate expressions are necessary because labels cannot be delimited by
# semicolon; see <http://stackoverflow.com/a/11799865/1968>
.PHONY: help
help:
@echo "$$(tput bold)Available rules:$$(tput sgr0)"
@echo
@sed -n -e "/^## / { \
h; \
s/.*//; \
:doc" \
-e "H; \
n; \
s/^## //; \
t doc" \
-e "s/:.*//; \
G; \
s/\\n## /---/; \
s/\\n/ /g; \
p; \
}" ${MAKEFILE_LIST} \
| LC_ALL='C' sort --ignore-case \
| awk -F '---' \
-v ncol=$$(tput cols) \
-v indent=19 \
-v col_on="$$(tput setaf 6)" \
-v col_off="$$(tput sgr0)" \
'{ \
printf "%s%*s%s ", col_on, -indent, $$1, col_off; \
n = split($$2, words, " "); \
line_length = ncol - indent; \
for (i = 1; i <= n; i++) { \
line_length -= length(words[i]) + 1; \
if (line_length <= 0) { \
line_length = ncol - indent - length(words[i]) - 1; \
printf "\n%*s ", -indent, " "; \
} \
printf "%s ", words[i]; \
} \
printf "\n"; \
}' \
| more $(shell test $(shell uname) = Darwin && echo '--no-init --raw-control-chars')
Loading