-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick & dirty get endpoints for chunks #14
Conversation
app/chunks/models.py
Outdated
Chunk = Annotated[Union[TextChunk, ImageChunk], Field(discriminator='kind')] | ||
|
||
class RetrieveResult(BaseModel): | ||
score: Optional[float] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit unnecessary in terms of nesting, which is what I was originlaly trying to avoid by putting the score in the chunks.
Specifically, now a user needs to do results.map(chunk: chunk.chunk.text)
to get to the text.
Colud we do a middle ground? I think the other improvement (indicating the discriminator) is helpful -- so we could keep BaseChunk
and still split out the Chunk = Annotated[Union[...], Field(discriminator='kind')]
and then use Sequence[Chunk]
in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just ran into that updating the demo app. I'll go back to using a base type and have a per-chunktype field for extra options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into this more, I think this will need to change once the chunk/embedding data model is in place - we'll be searching for (and scoring) embeddings, and then mapping those back to chunks. I'm going to patch this up a bit for now.
ad3c461
to
e7bca9c
Compare
dewy/chunks/models.py
Outdated
"""The similarity score of this chunk.""" | ||
|
||
text: Optional[str] = Field(..., description="Textual description of the chunk.") | ||
|
||
metadata: Union[TextChunk, ImageChunk] = Field(..., discriminator='kind') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels oddly nested -- to get the content of the text chunk, you need to go through metadata. I think what we had here previously was cleaner -- putting things like the score into the chunk so that it is flatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was leftover - ignore.
dewy/chunks/models.py
Outdated
|
||
score: Optional[float] = None | ||
|
||
class RetrieveResult(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is RetrieveResult used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh, orphan code. removing
document_id: int | ||
kind: Literal["image"] = "image" | ||
|
||
image: Optional[str] = Field(..., description="Image of the node.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely some kind of location int he document? start/end index?
This makes some changes to the chunk data model to make it easier to return as a resource - I expect this will evolve further as we start actually populating the DB with chunks.
c6a1249
to
6de4e32
Compare
This makes some changes to the chunk data model to make it easier to return as a resource - I expect this will evolve further as we start actually populating the DB with chunks.