Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you make it as an external DSL ? #8

Open
nadirvardar opened this issue Sep 9, 2015 · 4 comments
Open

Can you make it as an external DSL ? #8

nadirvardar opened this issue Sep 9, 2015 · 4 comments

Comments

@nadirvardar
Copy link

For example a file that contains the dsl and app can run the dsl and returns the result .

@bplawler
Copy link
Owner

bplawler commented Sep 9, 2015

That's not really a use case that I have, as what I am doing requires that the crawler DSL work alongside the rest of my Scala app. I'm certainly willing to accept any pull requests though!

@ahirner
Copy link

ahirner commented Apr 12, 2016

I'm also interested and looking for guidance: What is the most atomic expression to feed into a Crawler instance? After each consumed expression, it should be possible to inspect the nodeStack.

My initial guess is simply a (Function1[Unit, => Unit], ElementProcessor), but then I'm yet confused by the required stack state and varying types for the entry points (in, from, forAll).
Meanwhile, I bumped the version of HtmlUnit and made the crawler work for 2.21.

@bplawler
Copy link
Owner

@ahirner Thanks for bumping the HtmlUnit version. Out of curiosity, are you able to get the crawler to work with JavaScript heavy pages that are using, e.g. AngularJS or other library to render the front end?

The various entry points are indeed confusing, and I would probably not choose to write that code the same way if I were starting over on this project again today. Sorry for the uninformative response, it has long been my desire to start this codebase over again now that I have a couple more years of Scala experience under my belt...

@ahirner
Copy link

ahirner commented Apr 15, 2016

@bplawler the code is highly educating, e.g. how to handle explicit type conversions that are necessary with HtmlUnit. Thanks!
For my use case and up until now, it handled quirky JS pages just fine. This includes an SPA that streams the DOM in a quite old-fashioned way. In order to scrape such cases, I first injected basic JS query functions which ought to be used uniformly across sites. I haven't yet tested the Rhino/HtmlUnit combo with bleeding edge or more heavy-weight frontends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants