Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spider_skips property broken #401

Open
paullew opened this issue Mar 14, 2016 · 16 comments · Fixed by #447
Open

spider_skips property broken #401

paullew opened this issue Mar 14, 2016 · 16 comments · Fixed by #447

Comments

@paullew
Copy link

paullew commented Mar 14, 2016

I'm using the default spider.yaml from https://raw.githubusercontent.com/BBC-News/wraith/master/templates/configs/spider.yaml

I've tried running it with wraith installed locally on my mac, and also via the wraith docker image. Both fail, with different error messages.

On my mac locally:

$ wraith capture spider.yaml
Config validated. No serious issues found.
no paths defined in config, crawling from site root
creating new spider file
/Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `=~': type mismatch: String given (TypeError)
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `block in skip_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `any?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `skip_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:256:in `visit_link?'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `block in run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `delete_if'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:151:in `run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:92:in `block in crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:83:in `initialize'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in `new'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:90:in `crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/anemone-0.7.2/lib/anemone/core.rb:18:in `crawl'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:69:in `spider'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:35:in `determine_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/spider.rb:23:in `check_for_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:36:in `check_for_paths'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:133:in `block in capture'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/lib/wraith/cli.rb:130:in `capture'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/gems/wraith-3.1.0/bin/wraith:5:in `<top (required)>'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/wraith:23:in `load'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/wraith:23:in `<main>'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/ruby_executable_hooks:15:in `eval'
    from /Users/paullew/.rvm/gems/ruby-2.2.1/bin/ruby_executable_hooks:15:in `<main>'

Running it via the wraith docker image:

$ docker run --rm -P -v ~/devel/resources/testing/wraith:/wraithy -w='/wraithy' bbcnews/wraith capture spider.yaml
/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:in `spider': undefined local variable or method `wraith' for #<Wraith::Crawler:0x005648d0ca4be8> (NameError)
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:36:in `determine_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/spider.rb:24:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:36:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:134:in `block in capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/lib/wraith/cli.rb:131:in `capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.1.2/bin/wraith:5:in `<top (required)>'
    from /usr/local/bin/wraith:23:in `load'
    from /usr/local/bin/wraith:23:in `<main>'
Config validated. No serious issues found.
no paths defined in config, crawling from site root
@altV
Copy link

altV commented Mar 18, 2016

I'm getting undefined local variable or method `wraith' for #Wraith::Crawler:0x005648d0ca4be8 (NameError) for default spider.yml as well, version 3.1.2

@imagreenplant
Copy link

Same here:

wraith capture configs/spider.yaml Config validated. No serious issues found. no paths defined in config, crawling from site root /Library/Ruby/Gems/2.0.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:inspider': undefined local variable or method wraith' for #<Wraith::Crawler:0x007fd2f1938610> (NameError)

@Vexrm
Copy link

Vexrm commented Mar 28, 2016

Ditto. I can't add extra information, but am watching for more info.

@trioni
Copy link

trioni commented Mar 31, 2016

I get the same thing. Same error as @imagreenplant

@Dbuggerx
Copy link

Dbuggerx commented Jun 6, 2016

I'm also getting the same error as @imagreenplant. Is this solved?

@ocrunch
Copy link

ocrunch commented Jun 8, 2016

Same Issue!

@slimatic
Copy link

slimatic commented Jun 9, 2016

Any thoughts on what this issue could be?

/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:64:inspider': undefined local variable or method wraith' for #<Wraith::Crawler:0x0055cde740ed58> (NameError)

@xinbin
Copy link

xinbin commented Jun 14, 2016

I have the same error message as @slimatic , full dump:

/usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:64:in
 'spider': undefined local variable or method 'wraith' for # (NameError)
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:36:in 'determine_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/spider.rb:24:in 'check_for_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:36:in 'check_for_paths'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:134:in 'block in capture'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:28:in 'within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/lib/wraith/cli.rb:131:in 'capture'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/command.rb:27:in 'run'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in 'invoke_command'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor.rb:359:in 'dispatch'
    from /usr/local/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/base.rb:440:in 'start'
    from /usr/local/lib/ruby/gems/2.2.0/gems/wraith-3.1.2/bin/wraith:5:in ''
    from /usr/local/bin/wraith:23:in 'load'
    from /usr/local/bin/wraith:23:in ''

@bjorndavis
Copy link

I'm getting this as well:

C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:65:in `spider': undefined local variable or met
hod `wraith' for #<Wraith::Crawler:0x00000002e81ca8> (NameError)
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:38:in `determine_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/spider.rb:24:in `check_for_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:36:in `check_for_paths'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:134:in `block in capture'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/lib/wraith/cli.rb:131:in `capture'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/wraith-3.2.0/bin/wraith:5:in `<top (required)>'
        from C:/Ruby22-x64/bin/wraith:23:in `load'
        from C:/Ruby22-x64/bin/wraith:23:in `<main>'

@catchergeese
Copy link

It fails for me too (default spider config yaml file, running in docker container):

$ wraith capture configs/spider.yaml
/usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:64:in `spider': undefined local variable or method `wraith' for #<Wraith::Crawler:0x0055808ac4a508> (NameError)
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:36:in `determine_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/spider.rb:24:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:36:in `check_for_paths'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:134:in `block in capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:28:in `within_acceptable_limits'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/lib/wraith/cli.rb:131:in `capture'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
    from /usr/local/lib/ruby/gems/2.1.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
    from /usr/local/lib/ruby/gems/2.1.0/gems/wraith-3.2.0/bin/wraith:5:in `<top (required)>'
    from /usr/local/bin/wraith:23:in `load'
    from /usr/local/bin/wraith:23:in `<main>'

Are there any chances you are going to fix it in the near future?

@kyleskrinak
Copy link

3.2.1 does not address this issue on my system. I'm still seeing the error message:

.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `=~': type mismatch: String given (TypeError)
        from /Users/x/.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `block in skip_link?'
        from /Users/x/.rvm/gems/ruby-2.2.0/gems/anemone-0.7.2/lib/anemone/core.rb:298:in `any?'
        etc…

@ChrisBAshton
Copy link
Contributor

ChrisBAshton commented Aug 29, 2016

This error has been fixed in 3.2.1:

'spider': undefined local variable or method 'wraith' for # (NameError)

However, I can see that the original error in this issue is:

core.rb:298:in =~': type mismatch: String given (TypeError)

This issue has been closed in error. Re-opening.

@ChrisBAshton ChrisBAshton reopened this Aug 29, 2016
@Peter-Petrik
Copy link

Peter-Petrik commented Oct 11, 2016

Experiencing this in 3.2.1

/var/lib/gems/1.9.1/gems/anemone-0.7.2/lib/anemone/core.rb:298:in```=~': type mismatch: String given (TypeError)

In spider.yaml I commented out:
- !ruby/regexp /^\/baz\//
and I'm no longer seeing the error.

@ChrisBAshton ChrisBAshton changed the title Default spider.yaml fails with error spider_skips property broken Nov 25, 2016
@sembrat
Copy link

sembrat commented May 8, 2018

For the folks encountering this issue, does the removal of non-regexp within the spider_skips fix this?

Some quick testing on my end noticed that anything processed within Ruby regexp was handled fine, whereas anything with a string of the path broke spider_skips.

To solve this, I had to encapsulate all my non-regexp strings to skip as strict string matches in regexp, which isn't exactly ideal for those (like me) who are awful at regexp syntax.

@edurenye
Copy link

This error is related to the web-spider framework that is using wraith called Anemone this framework in not maintained anymore, last commit was in 2012.
I think we should replace it with Medusa a maintained fork of Anemone that has the same API, so should work without to much trouble.

@ErroneousBosch
Copy link

@sembrat

Yes, switching to regex does seem to correct the errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.