-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
64 changed files
with
749 additions
and
1,442 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,35 +1,42 @@ | ||
# Async database transactions and FastAPI | ||
# DBHub | ||
|
||
This repo aims to provide working code and reproducible setups for asynchronous data ingestion and querying from numerous databases via Python. Wherever possible, data is ingested to a database via their supported async Python drivers, and the data is also queried in async fashion on top of FastAPI endpoints. | ||
## Boilerplate for async ingestion and querying of DBs | ||
|
||
Example code is provided for numerous databases, along with FastAPI docker deployments that allow you to easily supply complex query results to downstream applications. | ||
This repo aims to provide working code and reproducible setups for bulk data ingestion and querying from numerous databases via their Python clients. Wherever possible, async database client APIs are utilized for data ingestion. The query interface to the data is exposed via async FastAPI endpoints. To enable reproducibility across environments, Dockerfiles are provided as well. | ||
|
||
#### Currently implemented | ||
The `docker-compose.yml` does the following: | ||
1. Set up a local DB server in a container | ||
2. Set up local volume mounts to persist the data | ||
3. Set up a FastAPI server in another container | ||
4. Set up a network bridge such that the DB server can be accessed from the FastAPI server | ||
5. Tear down all the containers once development and testing is complete | ||
|
||
### Currently implemented | ||
* Neo4j | ||
* Elasticsearch | ||
* Meilisearch | ||
* Qdrant | ||
* Weaviate | ||
|
||
#### 🚧 Coming soon | ||
### 🚧 In the pipeline | ||
* LanceDB | ||
* SurrealDB | ||
* MongoDB | ||
|
||
|
||
## Goals | ||
|
||
The primary aim is to compare the data ingestion and query performance of various databases that can be used for a host of downstream use cases. Two use cases are of particular interest: | ||
The main goals of this repo are explained as follows. | ||
|
||
1. We may want to expose (potentially sensitive) data to downstream client applications, so building an API on top of the database can be a very useful tool to share the data in a controlled manner | ||
1. **Ease of setup**: There are tons of databases and client APIs out there, so it's useful to have a clean, efficient and reproducible workflow to experiment with a range of datasets, as well as databases for the problem at hand. | ||
|
||
2. Databases or data stores in general can be important "sources of truth" for contextual querying via LLMs like ChatGPT, allowing us to ground our model's results with factual data. APIs allow us to add another layer to simplify querying a host of backends in a way that doesn't rely on the LLM learning a specific query language. | ||
2. **Ease of distribution**: We may want to expose (potentially sensitive) data to downstream client applications, so building an API on top of the database can be a very useful tool to share the data in a controlled manner | ||
|
||
In general, it's useful to have a clean, efficient and reproducible workflow to experiment with each database in question. | ||
3. **Ease of testing advanced use cases**: Search databases (either full-text keyword search or vector DBs) can be important "sources of truth" for contextual querying via LLMs like ChatGPT, allowing us to ground our model's results with factual data. | ||
|
||
|
||
## Pre-requisites | ||
|
||
Install Docker and the latest version of Python (3.11+), as recent syntactic improvements in Python are extensively utilized in the code provided. | ||
|
||
## About the dataset | ||
|
||
The [dataset provided](https://github.com/prrao87/async-db-fastapi/tree/main/data) in this repo is a formatted version of the version obtained from Kaggle datasets. Full credit is due to [the original author](https://www.kaggle.com/zynicide) via Kaggle for curating this dataset. | ||
* Python 3.10+ | ||
* Docker | ||
* A passion to learn more about and experiment with databases! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,16 @@ | ||
from pydantic import BaseSettings | ||
from pydantic_settings import BaseSettings, SettingsConfigDict | ||
|
||
|
||
class Settings(BaseSettings): | ||
model_config = SettingsConfigDict( | ||
env_file=".env", | ||
extra="allow", | ||
) | ||
|
||
elastic_service: str | ||
elastic_user: str | ||
elastic_password: str | ||
elastic_url: str | ||
elastic_port: int | ||
elastic_index_alias: str | ||
tag: str | ||
|
||
class Config: | ||
env_file = ".env" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
from pydantic import BaseModel, ConfigDict, Field, model_validator | ||
|
||
|
||
class Wine(BaseModel): | ||
model_config = ConfigDict( | ||
populate_by_name=True, | ||
validate_assignment=True, | ||
extra="allow", | ||
str_strip_whitespace=True, | ||
json_schema_extra={ | ||
"example": { | ||
"id": 45100, | ||
"points": 85, | ||
"title": "Balduzzi 2012 Reserva Merlot (Maule Valley)", | ||
"description": "Ripe in color and aromas, this chunky wine delivers heavy baked-berry and raisin aromas in front of a jammy, extracted palate. Raisin and cooked berry flavors finish plump, with earthy notes.", | ||
"price": 10.0, | ||
"variety": "Merlot", | ||
"winery": "Balduzzi", | ||
"vineyard": "Reserva", | ||
"country": "Chile", | ||
"province": "Maule Valley", | ||
"region_1": "null", | ||
"region_2": "null", | ||
"taster_name": "Michael Schachner", | ||
"taster_twitter_handle": "@wineschach", | ||
} | ||
}, | ||
) | ||
|
||
id: int | ||
points: int | ||
title: str | ||
description: str | None | ||
price: float | None | ||
variety: str | None | ||
winery: str | None | ||
vineyard: str | None = Field(..., alias="designation") | ||
country: str | None | ||
province: str | None | ||
region_1: str | None | ||
region_2: str | None | ||
taster_name: str | None | ||
taster_twitter_handle: str | None | ||
|
||
@model_validator(mode="before") | ||
def _fill_country_unknowns(cls, values): | ||
"Fill in missing country values with 'Unknown', as we always want this field to be queryable" | ||
country = values.get("country") | ||
if country is None or country == "null": | ||
values["country"] = "Unknown" | ||
return values | ||
|
||
@model_validator(mode="before") | ||
def _create_id(cls, values): | ||
"Create an _id field because Elastic needs this to store as primary key" | ||
values["_id"] = values["id"] | ||
return values | ||
|
||
|
||
if __name__ == "__main__": | ||
data = { | ||
"id": 45100, | ||
"points": 85, | ||
"title": "Balduzzi 2012 Reserva Merlot (Maule Valley)", | ||
"description": "Ripe in color and aromas, this chunky wine delivers heavy baked-berry and raisin aromas in front of a jammy, extracted palate. Raisin and cooked berry flavors finish plump, with earthy notes.", | ||
"price": 10, # Test if field is cast to float | ||
"variety": "Merlot", | ||
"winery": "Balduzzi", | ||
"designation": "Reserva", # Test if field is renamed | ||
"country": "null", # Test unknown country | ||
"province": " Maule Valley ", # Test if field is stripped | ||
"region_1": "null", | ||
"region_2": "null", | ||
"taster_name": "Michael Schachner", | ||
"taster_twitter_handle": "@wineschach", | ||
} | ||
from pprint import pprint | ||
|
||
wine = Wine(**data) | ||
pprint(wine.model_dump(), sort_dicts=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
# Master key must be at least 16 bytes, composed of valid UTF-8 characters | ||
MEILI_MASTER_KEY = "" | ||
MEILI_VERSION = "v1.1.1" | ||
MEILI_VERSION = "v1.2.0" | ||
MEILI_PORT = 7700 | ||
MEILI_URL = "localhost" | ||
MEILI_SERVICE = "meilisearch" | ||
API_PORT = 8003 | ||
|
||
# Container image tag | ||
TAG = "0.1.0" | ||
TAG = "0.2.0" | ||
|
||
# Docker project namespace (defaults to the current folder name if not set) | ||
COMPOSE_PROJECT_NAME = meili_wine |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,14 @@ | ||
from pydantic import BaseSettings | ||
from pydantic_settings import BaseSettings, SettingsConfigDict | ||
|
||
|
||
class Settings(BaseSettings): | ||
model_config = SettingsConfigDict( | ||
env_file=".env", | ||
extra="allow", | ||
) | ||
|
||
meili_service: str | ||
meili_master_key: str | ||
meili_port: int | ||
meili_url: str | ||
tag: str | ||
|
||
class Config: | ||
env_file = ".env" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.