-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example extractors #145
Conversation
…er even for the tests (not most efficient, but this is what we want to test!)
…hese as these can be tested locally as well
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! Thanks David <3
The main comment is to break down the PR into many commits, and move some code around.
… from setup.py for cargo tests
Please make sure to merge #146 before this |
Will move the extractors here https://github.com/tensorlakeai/indexify-extractors and remove them from diptanu/indexify |
Implemented and tested these extractors:
How to reproduce:
(1) Package the extractor into a docker image
cargo run extractor package --dev -v --config-path extractors/simple_invoice_parser.yaml
(2) Run the extractor, mounting any files required
cargo run
can be replaced by./target/debug/indexify
orindexify
if the binaries were added to thePATH
.For maintainers & contributors:
These steps are usually not needed when working only extractor! Currently, the extractor-base image is pulled from DockerHub, due to how BuildKit works (it does not use a local registry). Unfortunately, it seems tedious to resolve this (see moby/buildkit#2343). If you need to modify the rust code to run the extractors, please run
make build-base-extractor-push
, this will build the image, and push it to dockerhubdockerfiles/Dockerfiles.extractor
to use your ownFROM ...
, (i.e.FROM yenicelik/indexify-extractor-base
insteadFROM diptanu/indexify-extractor-base
).