Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Add just and pre-commit #201

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
abf1f12
Add pre-commit
sarayourfriend Sep 7, 2021
b24a10a
Run pre-commit an all files
sarayourfriend Sep 7, 2021
fa5d261
Use sh instead of bash
sarayourfriend Sep 7, 2021
d2251db
Add flake8 configuration for editor integration
sarayourfriend Sep 7, 2021
1ef2741
Use pre-commit for linting on CI
sarayourfriend Sep 7, 2021
974e572
Add flake8 and black for editor dependencies
sarayourfriend Sep 7, 2021
98c4c32
Remove duplicated style job
sarayourfriend Sep 8, 2021
abf3e52
Condense logs into a single parameterized command
sarayourfriend Sep 13, 2021
58956d7
Add link to just and clarify that docker-compose still works
sarayourfriend Sep 13, 2021
7bea78a
Disable formatting for es_mapping.py
sarayourfriend Sep 13, 2021
27a402c
Use latest versions of actions
sarayourfriend Sep 13, 2021
30ccacc
Move pre-commit and dev deps into openverse-api Pipfile
sarayourfriend Sep 16, 2021
46f0fba
Add command to lint all files and fix one error
sarayourfriend Sep 16, 2021
ce352ec
Remove unnecessary Pipfile
sarayourfriend Sep 16, 2021
d4e2682
Add back testlocal recipe
sarayourfriend Sep 16, 2021
79b4eef
Fix linting task
sarayourfriend Sep 16, 2021
caa41ee
Use working-directory instead of cd
sarayourfriend Sep 16, 2021
ea9524b
Fix circular import bug introduced during import order correction
dhruvkb Sep 17, 2021
728e4e5
Add missing trailing slash
dhruvkb Sep 17, 2021
f4c9023
Use more cautious curl approach
sarayourfriend Sep 17, 2021
c5192a0
Fix pre-commit-config
sarayourfriend Sep 17, 2021
45810ef
Mention `just` in prereqs for running the repo
sarayourfriend Sep 17, 2021
85f125b
Add explanations for flake8 ignores
sarayourfriend Sep 17, 2021
0683180
Remove duplicated config
sarayourfriend Sep 17, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[flake8]
# match black formatter's behavior
ignore = E203, W503
sarayourfriend marked this conversation as resolved.
Show resolved Hide resolved
per-file-ignores =
# Ignore maximum line length rule for test files
*test*:E501
*__init__*:F401
*wsgi.py:E402
max-line-length = 88
5 changes: 2 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Dependabot Configuration File #
#################################

# current Github-native version of Dependabot
# current Github-native version of Dependabot
version: 2

updates:
Expand All @@ -13,12 +13,11 @@ updates:
# Check for updates once a week
schedule:
interval: 'weekly'

# Enable version updates for Python
- package-ecosystem: 'pip'
# Look for a `Pipfile` in the `/openverse-api` directory
directory: '/openverse-api'
# Check for updates once a week
schedule:
interval: 'weekly'

24 changes: 14 additions & 10 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,22 @@ on:
workflow_dispatch:

jobs:
Style:
Linting:
runs-on: ubuntu-latest

steps:
- uses: actions/setup-python@v2
- name: Install pycodestyle
run: pip install pycodestyle
- name: Checkout
uses: actions/checkout@v2
- name: Check API style
run: pycodestyle openverse-api/catalog --exclude='openverse-api/catalog/api/migrations,openverse-api/catalog/example_responses.py' --max-line-length=80 --ignore=E402,E702
- name: Check ingestion-server style
run: pycodestyle ingestion_server/ingestion_server --max-line-length=80 --ignore=E402
- uses: actions/checkout@v2
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Lint
run: |
cd openverse-api
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
pip install --upgrade pipenv
pipenv install --dev
pipenv run pre-commit run --all-files

Tests:
timeout-minutes: 15
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr_label_check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ jobs:
- name: Check goal label
uses: sugarshin/required-labels-action@v0.3.1
with:
required_oneof: '🌟 goal: addition,🛠 goal: fix,✨ goal: improvement,🧰 goal: internal improvement'
required_oneof: '🌟 goal: addition,🛠 goal: fix,✨ goal: improvement,🧰 goal: internal improvement'
56 changes: 56 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
exclude: Pipfile\.lock|migrations|\.idea

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
- id: check-executables-have-shebangs
- id: check-json
- id: check-case-conflict
- id: check-toml
- id: check-merge-conflict
- id: check-xml
- id: check-yaml
- id: end-of-file-fixer
- id: check-symlinks
- id: mixed-line-ending
- id: fix-encoding-pragma
args:
- --remove
- id: pretty-format-json
args:
- --autofix
- id: requirements-txt-fixer

- repo: https://github.com/PyCQA/isort
rev: 5.9.1
hooks:
- id: isort
name: Run isort to sort imports
sarayourfriend marked this conversation as resolved.
Show resolved Hide resolved
files: \.py$
exclude: ^build/.*$|^.tox/.*$|^venv/.*$
args:
- --lines-after-imports=2
- --multi-line=3
- --trailing-comma
- --force-grid-wrap=0
- --use-parentheses
- --ensure-newline-before-comments
- --line-length=88

- repo: https://gitlab.com/pycqa/flake8
Copy link
Member

@dhruvkb dhruvkb Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to match the official way their org is named.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their docs show a GitHub URL. Let's use that.

Suggested change
- repo: https://gitlab.com/pycqa/flake8
- repo: https://github.com/PyCQA/flake8

rev: 3.9.2
hooks:
- id: flake8
sarayourfriend marked this conversation as resolved.
Show resolved Hide resolved
args:
- --per-file-ignores=*test*:E501,*__init__*:F401,*wsgi.py:E402
- --max-line-length=88
- --ignore=E203,W503
krysal marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these args won't be necessary as the hook will automatically use the config from .flake8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't, unfortunately. pre-commit runs hooks inside a virtualenv that doesn't include the root of your project, so it doesn't know any of your configuration files 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this locally and it seemed to work fine. Are you sure about it not working?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend for the hook, is it possible to provide a config path instead of defining the rules twice in two different locations? If not I think just leaving a note here should be enough that mentions the rules also being in the other file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure 😅 I'd tried it in the past and it didn't work and I found some SO answers about it to confirm the issue, but if you've got it working I'll try it again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it worked! I removed the duplicated config. Thanks for testing it out!


- repo: https://github.com/ambv/black
rev: 21.6b0
hooks:
- id: black
sarayourfriend marked this conversation as resolved.
Show resolved Hide resolved
args:
- --safe
44 changes: 23 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ git clone https://github.com/WordPress/openverse-api.git
```

4. Change directories with `cd openverse-api`
5. Start Openverse API locally by running the docker containers
5. Start Openverse API locally by running the docker containers. You can use usual `docker-compose` commands or the simplified `just` command. You will need the [just](https://github.com/casey/just#installation) command runner installed to follow the next steps.
sarayourfriend marked this conversation as resolved.
Show resolved Hide resolved

```
docker-compose up
just up
```

6. Wait until your CMD or terminal displays that it is starting development server at `http://0.0.0.0:8000/`
Expand All @@ -42,23 +42,35 @@ docker-compose up
10. Still in the new CMD or terminal, load the sample data. This script requires a local postgres installation to connect to and alter our database.

```
./load_sample_data.sh
just init
```

11. Still in the new CMD or terminal, hit the API with a request

```
curl localhost:8000/v1/images?q=honey
just healthcheck
```

12. Make sure you see the following response from the API
![Sample API_Request](localhost_request.PNG)

Congratulations! You just ran the server locally.

To access the logs run:

```
just logs
```

That will follow all the logs for all the services. To isolate a service, simply pass the service name, for example:

```
just logs web
```

### What Happens In the Background

After executing `docker-compose up` (in Step 5), you will be running:
After executing `just up` (in Step 5), you will be running:

- A Django API server
- Two PostgreSQL instances (one simulates the upstream data source, the other serves as the application database)
Expand Down Expand Up @@ -104,34 +116,24 @@ Every week, the latest version of the data is automatically bulk copied ("ingest

You can check the health of a live deployment of the API by running the live integration tests.

1. Change directory to the `openverse-api`
1. Run the install recipe:

```
cd openverse-api
just install
```

#### On the host

1. Install all dependencies for Openverse API.
1. Run the tests in a Pipenv subshell.
```
pipenv install
```

2. Run the tests in a Pipenv subshell.
```
pipenv run bash ./test/run_test.sh
just testlocal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this by trying to run this command:

error: Justfile does not contain recipe `testlocal`

Copy link
Contributor Author

@sarayourfriend sarayourfriend Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm I removed it for some reason. Let me see.

Do you know why these tests must run in a "pipenv subshell"?

Edit: Oh never mind, I see that the script calls to pytest.

This is going to throw a wrench in the root Pipfile idea 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I run the tests inside the container, we can leave only those instructions for tests.

docker-compose exec web bash ./test/run_test.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's ever necessary to run it outside of the container 🤔

Copy link
Member

@dhruvkb dhruvkb Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, I run the code outside the container when debugging it line by line (I like PyCharm's debugger GUI). Running the tests outside the containers helps in that scenario. I'm open to a different process though if I'm the only one using it like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Pycharm can take the interpreter of a container, it's quite flexible in this aspect. However, I can see the value in the local approach in not being tied up to this particular IDE, neither to the use of just if that is what is blocking here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krysal from what I had read in the docs, PyCharm can work with a Docker environment only if it created the container. I wasn't able to get it to connect with the existing web container orchestrated by Docker Compose 😢.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think that's true Dhruv, the docs might be incorrect. At my previous job we were able to get it to connect to the docker container just fine and it was running using docker compose... but maybe things have changed since then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, after making the changes in the latest commits we're able to add back the testlocal script 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testlocal script works perfectly. So does the regular test one.

```

#### Inside the container

1. Ensure that Docker containers are up. See the section above for instructions.
```
docker-compose ps
```

2. Run the tests in an interactive TTY connected to a `web` container.
1. Run the tests in an interactive TTY connected to a `web` container.
```
docker-compose exec web bash ./test/run_test.sh
just test
```

### How to Run Ingestion Server tests
Expand Down
55 changes: 28 additions & 27 deletions analytics/attribution_worker.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
import settings
import json
import logging as log
import urllib.parse as urlparse
from urllib.parse import parse_qs
from uuid import UUID

import settings
from confluent_kafka import Consumer
from models import AttributionReferrerEvent
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from confluent_kafka import Consumer


def parse_identifier(resource):
Expand All @@ -17,7 +18,7 @@ def parse_identifier(resource):
if query:
try:
query_parsed = parse_qs(query)
image_id = query_parsed['image_id'][0]
image_id = query_parsed["image_id"][0]
identifier = str(UUID(image_id))
except (KeyError, ValueError, TypeError):
identifier = None
Expand All @@ -29,34 +30,34 @@ def parse_message(msg):
return None
try:
decoded = json.loads(msg)
decoded = json.loads(scrub_malformed(decoded['message']))
resource = decoded['request'].split(' ')[1]
decoded = json.loads(scrub_malformed(decoded["message"]))
resource = decoded["request"].split(" ")[1]
_id = parse_identifier(resource)
parsed = {
'http_referer': decoded['http_referer'],
'resource': decoded['request'].split(' ')[1],
'identifier': _id
"http_referer": decoded["http_referer"],
"resource": decoded["request"].split(" ")[1],
"identifier": _id,
}
except (json.JSONDecodeError, KeyError):
log.warning(f'Failed to parse {msg}. Reason: ', exc_info=True)
log.warning(f"Failed to parse {msg}. Reason: ", exc_info=True)
parsed = None
return parsed


def save_message(validated_msg: dict, session):
event = AttributionReferrerEvent(
image_uuid=validated_msg['identifier'],
full_referer=validated_msg['http_referer'],
referer_domain=urlparse.urlparse(validated_msg['http_referer']).netloc,
resource=validated_msg['resource']
image_uuid=validated_msg["identifier"],
full_referer=validated_msg["http_referer"],
referer_domain=urlparse.urlparse(validated_msg["http_referer"]).netloc,
resource=validated_msg["resource"],
)
session.add(event)
session.commit()


def scrub_malformed(_json: str):
""" Remove some invalid JSON that NGINX sometimes spits out """
return _json.replace('\"upstream_response_time\":,', '')
"""Remove some invalid JSON that NGINX sometimes spits out"""
return _json.replace('"upstream_response_time":,', "")


def is_valid(parsed_msg: dict):
Expand All @@ -68,9 +69,9 @@ def is_valid(parsed_msg: dict):
if parsed_msg is None:
return False
try:
referer = parsed_msg['http_referer']
resource = parsed_msg['resource']
valid = 'creativecommons.org' not in referer and '.svg' in resource
referer = parsed_msg["http_referer"]
resource = parsed_msg["resource"]
valid = "creativecommons.org" not in referer and ".svg" in resource
except KeyError:
valid = False
return valid
Expand All @@ -83,28 +84,28 @@ def listen(consumer, database):
while True:
msg = consumer.poll(timeout=timeout)
if msg:
parsed_msg = parse_message(str(msg.value(), 'utf-8'))
parsed_msg = parse_message(str(msg.value(), "utf-8"))
if is_valid(parsed_msg):
save_message(parsed_msg, database)
saved += 1
else:
ignored += 1
else:
log.info('No message received in {timeout}')
log.info("No message received in {timeout}")
if saved + ignored % 100 == 0:
log.info(f'Saved {saved} attribution events, ignored {ignored}')
log.info(f"Saved {saved} attribution events, ignored {ignored}")


if __name__ == '__main__':
if __name__ == "__main__":
log.basicConfig(
filename=settings.ATTRIBUTION_LOGFILE,
format='%(asctime)s %(message)s',
level=log.INFO
format="%(asctime)s %(message)s",
level=log.INFO,
)
consumer_settings = {
'bootstrap.servers': settings.KAFKA_HOSTS,
'group.id': 'attribution_streamer',
'auto.offset.reset': 'earliest'
"bootstrap.servers": settings.KAFKA_HOSTS,
"group.id": "attribution_streamer",
"auto.offset.reset": "earliest",
}
c = Consumer(consumer_settings)
c.subscribe([settings.KAFKA_TOPIC_NAME])
Expand Down
14 changes: 10 additions & 4 deletions analytics/backdate.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
import datetime

import settings
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from analytics.report_controller import (
generate_usage_report, generate_source_usage_report,
generate_referrer_usage_report, generate_top_searches,
generate_top_result_clicks
generate_referrer_usage_report,
generate_source_usage_report,
generate_top_result_clicks,
generate_top_searches,
generate_usage_report,
)


"""
A one-off script for generating analytics reports back to September 2019, when
we first started collecting analytics data.
Expand All @@ -28,4 +34,4 @@
generate_top_result_clicks(session, start_date, current_end_date)

current_end_date -= datetime.timedelta(days=1)
print(f'Generated backdated reports for {current_end_date}')
print(f"Generated backdated reports for {current_end_date}")
Loading