Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking issue: pdf ecommerce mode #2922

Open
1 of 3 tasks
cdxker opened this issue Dec 9, 2024 · 1 comment
Open
1 of 3 tasks

tracking issue: pdf ecommerce mode #2922

cdxker opened this issue Dec 9, 2024 · 1 comment
Assignees

Comments

@cdxker
Copy link
Member

cdxker commented Dec 9, 2024

Description

Target(s)

<replace w/ name of the service(s) which are associated with this issue>

Community channels

Matrix is preferred. Reach out on discord or Matrix for further assistance.

@vtempest
Copy link

https://github.com/vtempest/ai-research-agent/tree/master/src/extractor/pdf-to-html
https://qwksearch.com/docs/docs/functions/src/extractor/pdf-to-html/
Check out this much more accurate pdf-to-html lib with 5k lines of code handling much v better formatting things like Footnote linking which ocr does not.

Trieves pdf2md is good for a few types of image pdf but its overkill too costly for use cases like 10k pdfs which this does for free.

https://blog.pgvecto.rs/why-hnsw-is-not-the-answer

You can easily run a 90% cheaper scaling stack with pgvector or cloud flare vector AI
It seems unclear who this is for, if it's for devs they can just use qdrant or type sense or HF transformers.js instead of a huge stack that's why it should be separated into modules. As a dev shop you need to find cost effective solutions or your clients will find them later.
Khami tried to justify this saying it's faster to use Hnsw qdrant but Hnsw is slower and older than ngt based algos that have dominated the benchmark
https://ann-benchmarks.com/index.html

We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
Demonstrating empathy and kindness toward other people
Being respectful of differing opinions, viewpoints, and experiences

It's ironic nick khami looks for bs excuses to ban great Ai researchers like me who found these ideas years ago and only after banning me he integrated. Khami was calling yc founders elitist douchey when I was encouraging him to apply... Then he became one and this will get worse next 20y investors have been messaged and there's a bunch of Ai accounts saying this is hostile and exclusionary. Hopefully you can learn to mature more and be forgiving whatever got you into a passive aggressive grudge in the spirit of the holidays 🎁✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants