-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Switch DB to postgres #10
Conversation
This uses asyncpg directly. This removes the tags from the methods, and changes the names to be more descriptive. The intent is to have a single `service` generated from the OpenAPI specification that includes methods like `retrieve_chunks` rather than `chunks.retrieve(...)`.
@@ -27,26 +27,26 @@ class RetrieveRequest(BaseModel): | |||
"""Whether to include a generated summary.""" | |||
|
|||
|
|||
class BaseStatement(BaseModel): | |||
class BaseChunk(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempting to s/Statements/Chunk
|
||
|
||
@router.post("/retrieve") | ||
async def retrieve(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse: | ||
"""Retrieve statements based on a given query.""" | ||
async def retrieve_chunks(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource Naming: I think it makes sense for this to be /chunks/retrieve
since it operates on chunks (and conceivably we could have other methods, like list chunks, etc.)
Method Naming: I can see retrieve
(if this is the only retireve method, so we have service.retrieve(...)
) but called it retrieve_chunks
for consistency with other things that name the resource. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrieve_chunk
seems fine - it's explicit about what's being returned which may be good.
It will be easier (in the UI) to treat this as a GET
with query params until we have enough query complexity to warrant a full-on post body - it would also simplify the API a bit, allowing this to be treated as just the list view over chunks. I don't want to overly emphasize the FE's needs though.
collection_validator = TypeAdapter(Collection) | ||
|
||
|
||
class CollectionCreate(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we want to standardize naming. Basically,I have:
Collection
the model that we generally return.CollectionCreate
(for creation). Could also doCreateCollection
orCreateCollectionInput
,CreateCollectionRequest
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Litestar at least goes with <Resource>
(for the result), <Resource>Create
(for the creation request), and <Resource>Update
for update requests:
class Author(BaseModel):
id: UUID | None
name: str
dob: date | None = None
class AuthorCreate(BaseModel):
name: str
dob: date | None = None
class AuthorUpdate(BaseModel):
name: str | None = None
dob: date | None = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually ended up with something like CreateCollectionRequest
and CreateCollectionResponse
in the gRPC days, but maybe that's more verbose than is necessary in this case.
session.refresh(collection) | ||
return collection | ||
result = await conn.fetchrow(""" | ||
INSERT INTO collection (name) VALUES ($1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think asyncpg
prepares statements using an LRU cache and the hash of the query. We could (maybe) manually manage that for more efficiency... but should be unnecessary -- we'll likely only have one or two queries per request, so the hash shouldn't be too bad.
"store": Store(), | ||
"engine": engine, | ||
} | ||
# if settings.APPLY_MIGRATIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to get yolo to run in the application, but it seems to only support older DB versions. I suspect our best path here is to just migrate ourselves, at least until we do migrations outside of docker compose. We could just have a loop over migration files or smoething if we really want. For now, did somethnig super stupid.
See https://gist.github.com/mattbillenstein/270a4d44cbdcb181ac2ed58526ae137d for an example script we could use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I say KISS
- db:/var/lib/postgresql | ||
networks: | ||
- kb-network | ||
healthcheck: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes things that depend on postgres to wait until it reports ready. I'd like to run migrations as part of that, but that is messy.
|
||
CREATE TABLE collection ( | ||
id SERIAL NOT NULL, | ||
name VARCHAR NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point soon, would like to add a tenant
table, and tenant_id
to the collection.
FOREIGN KEY(document_id) REFERENCES document (id) | ||
); | ||
|
||
-- CREATE TABLE ingestion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to split ingestions out. This could make it possible to add a new ingestion (eg., reconfigure things, and re-ingest) to existing documents, as well as supports changes to documents.
from app.ingest.extract import extract | ||
from app.ingest.extract.source import ExtractSource | ||
from app.ingest.store import Store, StoreDep | ||
|
||
router = APIRouter(tags=["documents"], prefix="/documents") | ||
# TODO: Move this to `/documents`. Will require figuring out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this TODO. I'd like to flatten it, but that is tricky for the list_documents
case (which collection do we list). Also, need to make sure we can do file upload with a JSON body in the add_document
. So, deferring to a separate PR.
|
||
|
||
@router.post("/retrieve") | ||
async def retrieve(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse: | ||
"""Retrieve statements based on a given query.""" | ||
async def retrieve_chunks(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrieve_chunk
seems fine - it's explicit about what's being returned which may be good.
It will be easier (in the UI) to treat this as a GET
with query params until we have enough query complexity to warrant a full-on post body - it would also simplify the API a bit, allowing this to be treated as just the list view over chunks. I don't want to overly emphasize the FE's needs though.
collection_validator = TypeAdapter(Collection) | ||
|
||
|
||
class CollectionCreate(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually ended up with something like CreateCollectionRequest
and CreateCollectionResponse
in the gRPC days, but maybe that's more verbose than is necessary in this case.
"store": Store(), | ||
"engine": engine, | ||
} | ||
# if settings.APPLY_MIGRATIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I say KISS
This uses asyncpg directly.
This removes the tags from the methods, and changes the names to be more descriptive. The intent is to have a single
service
generated from the OpenAPI specification that includes methods likeretrieve_chunks
rather thanchunks.retrieve(...)
.