Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lancedb ner example #912

Merged
merged 5 commits into from
May 22, 2024
Merged

Add lancedb ner example #912

merged 5 commits into from
May 22, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented May 21, 2024

See commits.

This is almost there expect for the example code -- need to also create a notebook for this to run properly.

Changes

  • adds some materializers
  • adds union type check support

How I tested this

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

sweep-ai bot commented May 21, 2024

Sweep: PR Review

Sweep is currently reviewing your pr...

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- a few minor hygiene stuff

examples/LLM_Workflows/NER_Example/run.py Outdated Show resolved Hide resolved
examples/LLM_Workflows/NER_Example/run.py Show resolved Hide resolved
hamilton/plugins/huggingface_extensions.py Show resolved Hide resolved
hamilton/plugins/huggingface_extensions.py Show resolved Hide resolved
hamilton/plugins/huggingface_extensions.py Show resolved Hide resolved
@skrawcz
Copy link
Collaborator Author

skrawcz commented May 21, 2024 via email

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, I think we should probably break it up into more fns/modules, but see it going either way

skrawcz added 5 commits May 21, 2024 18:30
This shows how one might build a pipeline and utilize
models to extract entities and embeddings. Then
save them to lancedb, and then use both to query
over them.

WIP (+4 squashed commits)
Squashed commits:
[c91afb6] wip
[17cd297] Gets example to run on HF datasets properly

TODOs:
 - tidy up
 - README
 - remove parallel in favor of discussion
[f840934] TODOs:

1. remove parallel - doesn't make sense for GPU case as you can't parallelize that, and you want to use datasets.map() for batching.
2. make it run on datasets
[b76d011] WIP create lanceDB NER example
This adds support for loading hugging face datasets.

It then also supports saving it to parquet and to lancedb.

Adds tests.

Putting lancedb saver here is arbitrary, but because we
would need to check installed dependencies either way,
I felt it would be simpler to put here for now. Ideally we could
convert between common formats to help here. E.g. pyarrow
tables could be something to simplify things.
Say we have this - and want to save it with a saver:

```python
def foo() -> Union[int, float]:
   return ...
```

If the saver's applicable_types is [int, float], this would previously fail, now
it does not. Added test for this.

If the saver's applicable_type was just `float` or `int`, then rightly this fails -- added
test for that explicitly.
Also cleans up notebook and adds comments to code
Makes some changes to make sure things run on google collab.

Plus some minor documentation / wording updates.
@skrawcz skrawcz force-pushed the add_lancedb_ner_example branch from f0088e8 to 328c227 Compare May 22, 2024 01:31
@skrawcz skrawcz merged commit 1654d14 into main May 22, 2024
23 checks passed
@skrawcz skrawcz deleted the add_lancedb_ner_example branch May 22, 2024 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants