Blog Posts:
- Choices, Key Concepts, and Project setup
- Build and Test REST Endpoints With Tornado
- Effective Canonical Logging Across Services
- API, Object, and Storage Data Models
Get the source code for the tutorial:
$ git clone https://github.com/scgupta/tutorial-python-microservice-tornado.git
$ cd tutorial-python-microservice-tornado
$ tree .
.
├── LICENSE
├── README.md
├── addrservice
│ ├── __init__.py
│ ├── database
│ │ ├── __init__.py
│ │ ├── addressbook_db.py
│ │ └── db_engines.py
│ ├── datamodel.py
│ ├── service.py
│ ├── tornado
│ │ ├── __init__.py
│ │ ├── app.py
│ │ └── server.py
│ └── utils
│ ├── __init__.py
│ └── logutils.py
├── configs
│ └── addressbook-local.yaml
├── data
│ ├── __init__.py
│ └── addresses
│ ├── namo.json
│ └── raga.json
├── requirements.txt
├── run.py
├── schema
│ └── address-book-v1.0.json
└── tests
├── __init__.py
├── integration
│ ├── __init__.py
│ ├── addrservice_test.py
│ └── tornado_app_addreservice_handlers_test.py
└── unit
├── __init__.py
├── address_data_test.py
├── addressbook_db_test.py
├── datamodel_test.py
└── tornado_app_handlers_test.py
The directory addrservice
is for the source code of the service, and the directory test
is for keeping the tests.
Setup Virtual Environment:
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip install --upgrade pip
$ pip3 install -r ./requirements.txt
Let's start from scratch:
$ git checkout -b <branch> tag-01-project-setup
You can run static type checker, linter, unit tests, and code coverage by either executing the tool directly or through run.py
script. In each of the following, In each of the following, you can use either of the commands.
Static Type Checker:
$ mypy ./addrservice ./tests
$ ./run.py typecheck
Linter:
$ flake8 ./addrservice ./tests
$ ./run.py lint
Unit Tests:
$ python -m unittest discover tests -p '*_test.py'
$ ./run.py test
Code Coverage:
$ coverage run --source=addrservice --branch -m unittest discover tests -p '*_test.py'
$ coverage run --source=addrservice --branch ./run.py test
After running tests with code coverage, you can get the report:
$ coverage report
Name Stmts Miss Branch BrPart Cover
-----------------------------------------------------------
addrservice/__init__.py 2 2 0 0 0%
You can also generate HTML report:
$ coverage html
$ open htmlcov/index.html
If you are able to run all these commands, your project setup has no error and you are all set for coding.
Checkout the code:
$ git checkout -b <branch> tag-02-microservice
File addrservice/service.py
has business logic for CRUD operations for the address-book. This file is indpendent of any web service framework.
It currenly has just stubs with rudimentry implementation keeing addresses in a dictionary. It is sufficint to implement and test the REST service endpoints.
Tornado is a framework to develop Python web/microservices. It uses async effectively to achieve high number of open connections. In this tutorial, we create a tornado.web.Application
and add tornado.web.RequestHandlers
in file addrservice/tornado/app.py
to serve various API endpoints for this address service. Tornado also has a rich framework for testing.
Web services return HTML back. In address book microservice, API data interface is JSON. We will examine key Tornado APIs of Application
, RequestHandler
and tornado.testing
to develop it.
But first, let's run the server and test it:
$ python3 addrservice/tornado/server.py --port 8080 --config ./configs/addressbook-local.yaml --debug
Starting Address Book on port 8080 ...
Also run lint, typecheck and test to verify nothing is broken, and also code coverage:
$ ./run.py lint
$ ./run.py typecheck
$ ./run.py test -v
$ coverage run --source=addrservice --omit="addrservice/tornado/server.py" --branch ./run.py test
$ coverage report
Name Stmts Miss Branch BrPart Cover
-------------------------------------------------------------------
addrservice/__init__.py 2 0 0 0 100%
addrservice/service.py 23 1 0 0 96%
addrservice/tornado/__init__.py 0 0 0 0 100%
addrservice/tornado/app.py 83 4 8 3 92%
-------------------------------------------------------------------
TOTAL 108 5 8 3 93%
The addrservice/tornado/server.py
has been omitted from coverage. This is the file used to start the server. Since Torando test framework has a mechanism to start the server in the same process where tests are running, this file does not get tested by unit and integration tests.
These are the addressbook API endpoints, implemented through two Request Handlers:
AddressBookRequestHandler
:
GET /addresses
: gets all addresses in the address bookPOST /addresses
: create an entry in the addressbook
AddressBookEntryRequestHandler
:
GET /addresses/{id}
: get the address book entry with given idPUT /addresses/{id}
: update the address book entry with given idDELETE /addresses/{id}
: delete the address book entry with given id
Here is a sample session exercising all endpoints (notice the POST response has Location in the Headers containing the URI/id 66fdbb78e79846849608b2cfe244a858
of the entry that gets created):
# Create an address entry
$ curl -i -X POST http://localhost:8080/addresses -d '{"full_name": "Bill Gates"}'
HTTP/1.1 201 Created
Server: TornadoServer/6.0.3
Content-Type: text/html; charset=UTF-8
Date: Tue, 10 Mar 2020 14:40:01 GMT
Location: /addresses/66fdbb78e79846849608b2cfe244a858
Content-Length: 0
Vary: Accept-Encoding
# Read the address entry
$ curl -i -X GET http://localhost:8080/addresses/66fdbb78e79846849608b2cfe244a858
HTTP/1.1 200 OK
Server: TornadoServer/6.0.3
Content-Type: application/json; charset=UTF-8
Date: Tue, 10 Mar 2020 14:44:26 GMT
Etag: "5496aee01a83cf2386641b2c43540fc5919d621e"
Content-Length: 22
Vary: Accept-Encoding
{"full_name": "Bill Gates"}
# Update the address entry
$ curl -i -X PUT http://localhost:8080/addresses/66fdbb78e79846849608b2cfe244a858 -d '{"full_name": "William Henry Gates III"}'
HTTP/1.1 204 No Content
Server: TornadoServer/6.0.3
Date: Tue, 10 Mar 2020 14:48:04 GMT
Vary: Accept-Encoding
# List all addresses
$ curl -i -X GET http://localhost:8080/addresses
HTTP/1.1 200 OK
Server: TornadoServer/6.0.3
Content-Type: application/json; charset=UTF-8
Date: Tue, 10 Mar 2020 14:49:10 GMT
Etag: "5601e676f3fa4447feaa8d2dd960be163af7570a"
Content-Length: 73
Vary: Accept-Encoding
{"66fdbb78e79846849608b2cfe244a858": {"full_name": "William Henry Gates III"}}
# Delete the address
$ curl -i -X DELETE http://localhost:8080/addresses/66fdbb78e79846849608b2cfe244a858
HTTP/1.1 204 No Content
Server: TornadoServer/6.0.3
Date: Tue, 10 Mar 2020 14:50:38 GMT
Vary: Accept-Encoding
# Verify address is deleted
$ curl -i -X GET http://localhost:8080/addresses
HTTP/1.1 200 OK
Server: TornadoServer/6.0.3
Content-Type: application/json; charset=UTF-8
Date: Tue, 10 Mar 2020 14:52:01 GMT
Etag: "bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f"
Content-Length: 2
Vary: Accept-Encoding
{}
$ curl -i -X GET http://localhost:8080/addresses/66fdbb78e79846849608b2cfe244a858
HTTP/1.1 404 '66fdbb78e79846849608b2cfe244a858'
Server: TornadoServer/6.0.3
Content-Type: application/json; charset=UTF-8
Date: Tue, 10 Mar 2020 14:53:06 GMT
Content-Length: 1071
Vary: Accept-Encoding
{"method": "GET", "uri": "/addresses/66fdbb78e79846849608b2cfe244a858", "code": 404, "message": "'66fdbb78e79846849608b2cfe244a858'", "trace": "Traceback (most recent call last):\n\n File \"... redacted call stack trace ... addrservice/tornado/app.py\", line 100, in get\n raise tornado.web.HTTPError(404, reason=str(e))\n\ntornado.web.HTTPError: HTTP 404: '66fdbb78e79846849608b2cfe244a858'\n"}
Checkout the code:
$ git checkout -b <branch> tag-03-logging
Effective logs can cut down diagnosis time and facilitate monitoring and altering.
Logfmt log format consists of key-value pairs. It offers good balance between processing using standard tools and human readibility.
Emiting one canonical log line](https://brandur.org/canonical-log-lines) for each request makes manual inspection easier. Assigning and logging a request id to each request, and passing that id to all called service helps correlate logs across services. The key-value pairs for the log are stored in a task context, which is maintained across asyncio task interleaving.
Logging are useful in diagnosing services, more so when async is involved. Python has a standard logging package, and its documentation includes an excellent HOWTO guide and Cookbook. These are rich source of information, and leave nothoing much to add. Following are some of the best practices in my opinion:
- Do NOT use ROOT logger directly throgh
logging.debug()
,logging.error()
methods directly because it is easy to overlook their default behavior. - Do NOT use module level loggers of variety
logging.getLogger(__name__)
because any complex project will require controlling logging through configuration (see next point). These may cause surprise if you forget to setdisable_existing_loggers
to false or overlook how modules are loaded and initialized. If use at all, calllogging.getLogger(__name__)
inside function, rather than outside at the beginning of a module. dictConfig
(inyaml
) offers right balance of versatility and flexibility compared toini
basedfileConfig
or doing it in code. Specifying logger in config files allows you to use different logging levels and infra in prod deployment, stage deployments, and local debugging (with increasingly more logs).
Sending logs to multiple data stores and tools for processing can be controled by a log configuration. Each logger has a format and multiple handlers can be associated with a logger. Here is a part of configs/addressbook-local.yaml
:
logging:
version: 1
formatters:
brief:
format: '%(asctime)s %(name)s %(levelname)s : %(message)s'
detailed:
format: 'time="%(asctime)s" logger="%(name)s" level="%(levelname)s" file="%(filename)s" lineno=%(lineno)d function="%(funcName)s" %(message)s'
handlers:
console:
class: logging.StreamHandler
level: INFO
formatter: brief
stream: ext://sys.stdout
file:
class : logging.handlers.RotatingFileHandler
level: DEBUG
formatter: detailed
filename: /tmp/addrservice-app.log
backupCount: 3
loggers:
addrservice:
level: DEBUG
handlers:
- console
- file
propagate: no
tornado.access:
level: DEBUG
handlers:
- file
tornado.application:
level: DEBUG
handlers:
- file
tornado.general:
level: DEBUG
handlers:
- file
root:
level: WARNING
handlers:
- console
Notice that this configuration not just defines a logger addrservice
for this service, but also modifies behavior of Tornado's general logger. There are several pre-defined handlers. Here the SteamHandler and RotatingFileHandler are being used to write to console and log files respectively.
Tornado has several hooks to control when and how logging is done:
log_function
: function Tornado calls at the end of every request to log the result.write_error
: to customize the error response. Information about the error is added to the log context.log_exception
: to log uncaught exceptions. It can be overwritten to log in logfmt format.
Start the server:
It will show the console log:
$ python3 addrservice/tornado/server.py --port 8080 --config ./configs/addressbook-local.yaml --debug
2020-03-17 12:54:15,198 addrservice INFO : message="STARTING" service_name="Address Book" port=8080
Watch the logs:
$ tail -f /tmp/addrservice-app.log
time="2020-03-17 12:54:15,198" logger="addrservice" level="INFO" file="logutils.py" lineno=57 function="log" message="STARTING" service_name="Address Book" port=8080
Send a request:
$ curl -i -X POST http://localhost:8080/addresses -d '{"name": "Bill Gates"}'
HTTP/1.1 201 Created
Server: TornadoServer/6.0.3
Content-Type: text/html; charset=UTF-8
Date: Tue, 17 Mar 2020 07:26:32 GMT
Location: /addresses/7feec2df29fd4b039028ad351bafe422
Content-Length: 0
Vary: Accept-Encoding
The console log will show brief log entries:
2020-03-17 12:56:32,784 addrservice INFO : req_id="e6cd3072530f46b9932946fd65a13779" method="POST" uri="/addresses" ip="::1" message="RESPONSE" status=201 time_ms=1.2888908386230469
The log file will show logfmt-style one-line detailed canonical log entries:
time="2020-03-17 12:56:32,784" logger="addrservice" level="INFO" file="logutils.py" lineno=57 function="log" req_id="e6cd3072530f46b9932946fd65a13779" method="POST" uri="/addresses" ip="::1" message="RESPONSE" status=201 time_ms=1.2888908386230469
Tests are quiet by default:
$ ./run.py lint
$ ./run.py typecheck
$ ./run.py test -v
test_address_book_endpoints (integration.tornado_app_addreservice_handlers_test.TestAddressServiceApp) ... ok
test_default_handler (unit.tornado_app_handlers_test.AddressServiceTornadoAppUnitTests) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.049s
OK
$ coverage run --source=addrservice --omit="addrservice/tornado/server.py" --branch ./run.py test
..
----------------------------------------------------------------------
Ran 2 tests in 0.049s
OK
$ coverage report
Name Stmts Miss Branch BrPart Cover
-------------------------------------------------------------------
addrservice/__init__.py 3 0 0 0 100%
addrservice/service.py 25 1 0 0 96%
addrservice/tornado/__init__.py 0 0 0 0 100%
addrservice/tornado/app.py 105 6 18 6 90%
addrservice/utils/__init__.py 0 0 0 0 100%
addrservice/utils/logutils.py 28 0 6 0 100%
-------------------------------------------------------------------
TOTAL 161 7 24 6 93%
If you want to change the log message during tests, change log level from ERROR to INFO:
# tests/unit/tornado_app_handlers_test.py
IN_MEMORY_CFG_TXT = '''
service:
name: Address Book Test
logging:
version: 1
root:
level: INFO
'''
With that change, if you run the tests, you can examine the logs:
$ ./run.py tests
INFO:addrservice:req_id="a100e35140604d72930cb16c9eed8e8a" method="GET" uri="/addresses/" ip="127.0.0.1" message="RESPONSE" status=200 time_ms=1.232147216796875
INFO:addrservice:req_id="29b08c81acbd403b89f007ba03b5fee7" method="POST" uri="/addresses/" ip="127.0.0.1" message="RESPONSE" status=201 time_ms=0.9398460388183594
WARNING:addrservice:req_id="1c959a77f9de4f7e87e384a174fb6fbe" method="POST" uri="/addresses/" ip="127.0.0.1" reason="Invalid JSON body" message="RESPONSE" status=400 time_ms=1.7652511596679688 trace="Traceback.....
Get the code:
$ git checkout -b <branch> tag-04-datamodel
Also known as communication or exchange data model The data model for interacting with a microservice. It is designed for efficiently exchanging (sending and receiving) data with the service.
The address book service uses JSON for exchanging data. The JSON schema for the data model is in schema/address-book-v1.0.json
, and test data in data/addresses/*.json
. Even the data must be tested to be correct. So there is a test tests/unit/address_data_test.py
to check whether data files conform to the JSON schema.
$ python3 tests/unit/address_data_test.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.006s
OK
Also known as application data model or data structures. It is designed for efficiently performing business logic (algorithms) of an application / service.
There are tools like Python JSON Schema Objects, Warlock, Valideer, that generate POPO (Plain Old Python Object) classes from a JSON schema. These tools do simple structural mapping from JSON schema elements to classes. However, there are validation checks, inheritance, and polymorphism that can't be expressed in JSON schema. So it may require hand-crafting a data model suitable for business logic.
The logical data model is implemented in addrservice/datamodel.py
.
Also known as Physical Data Model. It is designed for efficient storage, retrieval, and search. There are several kinds of data stores: relational, hierarchical, graph. A combination of these storage is picked depending upon the structure of the persistent data, and retrieval and search requirements.
The addrservice/database/addressbook_db.py
defines an AbstractAddressBookDB
, which the service interacts with. This decouples the storage choice, and allows changing the storage model without affecting rest of the code. For example, it defines an InMemoryAddressBookDB
and FileAddressBookDB
. The in-memory data store is useful in unit/integration tests as it facilitates deep asserts for the state of the store. The file backed storage persists the data in files, and useful for debugging.
The storage can be swapped by setting the configuration, for example:
addr-db:
memory: null
Based on the config, an appropriate DB engine is set up by addrservice/database/db_engines.py:create_addressbook_db()
.
Following is need to implement a SQL data store:
- Implement a sub-class of
AbstractAddressBookDB
that stores data in a RDBMS - Add a case in
create_addressbook_db()
- Change config (
configs/addressbook-local.yaml
)
It does not require touch any of the business logic.
Start the service:
$ python3 addrservice/tornado/server.py --port 8080 --config ./configs/addressbook-local.yaml --debug
2020-03-30 06:46:29,641 addrservice INFO : message="STARTING" service_name="Address Book" port=8080
Test CRUD with curl command:
$ curl -X 'GET' http://localhost:8080/addresses
{}
$ curl -i -X 'POST' -H "Content-Type: application/json" -d "@data/addresses/namo.json" http://localhost:8080/addresses
HTTP/1.1 100 (Continue)
HTTP/1.1 201 Created
Server: TornadoServer/6.0.3
Content-Type: text/html; charset=UTF-8
Date: Mon, 30 Mar 2020 01:22:10 GMT
Location: /addresses/0bc13fbf2db54ef392a08a37378afa7f
Content-Length: 0
Vary: Accept-Encoding
$ curl -X 'GET' http://localhost:8080/addresses/0bc13fbf2db54ef392a08a37378afa7f
{"full_name": "Narendra Modi", "addresses": ...}
$ curl -i -X 'PUT' -H "Content-Type: application/json" -d "@data/addresses/raga.json" http://localhost:8080/addresses/0bc13fbf2db54ef392a08a37378afa7f
HTTP/1.1 204 No Content
Server: TornadoServer/6.0.3
Date: Mon, 30 Mar 2020 01:24:51 GMT
Vary: Accept-Encoding
$ curl -X 'GET' http://localhost:8080/addresses/0bc13fbf2db54ef392a08a37378afa7f
{"full_name": "Rahul Gandhi", "addresses": ...}
$ curl -i -X 'DELETE' http://localhost:8080/addresses/0bc13fbf2db54ef392a08a37378afa7f
HTTP/1.1 204 No Content
Server: TornadoServer/6.0.3
Date: Mon, 30 Mar 2020 01:26:34 GMT
Vary: Accept-Encoding
$ curl -X 'GET' http://localhost:8080/addresses
{}
Change addr-db
in configs/addressbook-local.yaml
from memory to file system:
addr-db:
fs: /tmp/addrservice-db
Run all the commands again. You will see records being stored in /tmp/addrservice-db
as json files.
Run tests and check code coverage:
$ coverage run --source=addrservice --omit="addrservice/tornado/server.py" --branch ./run.py test
..........
----------------------------------------------------------------------
Ran 10 tests in 0.148s
OK
$ coverage report
Name Stmts Miss Branch BrPart Cover
--------------------------------------------------------------------------
addrservice/__init__.py 7 0 0 0 100%
addrservice/database/__init__.py 0 0 0 0 100%
addrservice/database/addressbook_db.py 107 5 28 1 96%
addrservice/database/db_engines.py 6 0 2 0 100%
addrservice/datamodel.py 226 0 54 0 100%
addrservice/service.py 36 0 2 0 100%
addrservice/tornado/__init__.py 0 0 0 0 100%
addrservice/tornado/app.py 107 2 20 4 95%
addrservice/utils/__init__.py 0 0 0 0 100%
addrservice/utils/logutils.py 28 0 6 0 100%
--------------------------------------------------------------------------
TOTAL 517 7 112 5 98%