-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose ScrapeHTML function in API #3
Conversation
Thanks for the PR, however, I'm hesitant to increase the surface of the public API unless there's good reason to, because doing so is typically not a reversible decision. What's the use case for providing the HTML body directly? Would providing a custom http client instead solve your use case? |
No, there are two specific cases this does not cover.
I appreciate the apprehension to extend the API, I feel that the the current API is missing a few things.
I know I'm going to need those three things to move forward with what I'm building so I was planning to upstream those changes. I absolutely understand if you don't want to expose and maintain those API's through this package though. For what it's worth, the recipe-scrapers library does expose that functionality. 🤷 |
Should be resolved by providing functional options to the API.
Fair call given the use cases you provided.
You mean having access to the raw ld+json data? I agree providing a mechanism for client to avoid network call is something the API should provide. Let me think on this and get back to you. |
Yeah, some underlying access to an untyped map[string]interface{} for debugging errors and analysis. I don't know if that's worth exposing by default given that it would require some allocations that some users may not need. |
Yeah I'm not sure that's something we want to expose, at least in that form. Some websites might not be scraped using ld+json data and instead rely on good-old scraping DOM elements, in which case the I do see however the issue with not having visibility into what goes wrong and I can think of a couple alternative ways to alleviate this:
In terms of your PR, I think what you proposed is fine, however, I would also rename
If you make this change I'd be happy to merge it - but please also update |
Should be good now, thanks! |
This PR adds the ability to pass the HTML body via an
io.Reader
to a new functionScrapeHTML
.No business code was written, I've just moved the parts that work with the body into their own function and call it from the
ScrapeFrom
function