Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added wraith spider [config_name] command #488

Merged
merged 4 commits into from
Nov 25, 2016
Merged

Added wraith spider [config_name] command #488

merged 4 commits into from
Nov 25, 2016

Conversation

ChrisBAshton
Copy link
Contributor

@ChrisBAshton ChrisBAshton commented Nov 25, 2016

We have a lot of open 'Spider mode' issues, many of which arise from the Wraith spidering logic being quite complex and difficult to test.

Wraith has to maintain a lot of internal state, triggering spidering automatically when certain config properties are missing, but only triggering the spidering if it hasn't been done in the last x-config-option days, etc.

We can make Wraith much simpler under the hood by giving spidering its own dedicated spider command, which the user must choose to run manually, as regularly as they choose. A new 'Imports' feature allows users to import configs into one another.

Proposed workflow

Say we have a simplified config:

imports: "spider_configs.yml"

domains:
  test: http://www.test.example.com
  live: http://www.test.example.com
  1. wraith spider test.yml => spiders the sites, and saves the paths to spider_configs.yml. NB: the first time you run this, Wraith will warn that spider_configs.yml doesn't exist. That's ok.
  2. wraith capture test.yml => the spider_configs.yml paths are automatically imported into the test.yml, and Wraith continues as if the paths were specified manually.

The 'Imports' feature has scope for lots of different uses. For example, you may have a common Wraith config defining browser engine, screen sizes to capture, colour of diff image, etc - and then you can have multiple different Wraith configs for each of your sites, all of which import the base config and stop you from having to duplicate all of that information.

@ChrisBAshton ChrisBAshton mentioned this pull request Nov 25, 2016
3 tasks
@ChrisBAshton ChrisBAshton changed the title Added wraith spider command Added wraith spider [config_name] command Nov 25, 2016
@ChrisBAshton ChrisBAshton changed the base branch from master to v4 November 25, 2016 13:44
@ChrisBAshton ChrisBAshton merged commit fda3d67 into v4 Nov 25, 2016
@ChrisBAshton ChrisBAshton deleted the spider branch November 25, 2016 14:32
@akiessling
Copy link

The spider command overwrites the file that is configured with imports. So you can't save a common configuration to that file AND have the spider command update the paths. Maybe the paths should be loaded from a separate file?

@ChrisBAshton
Copy link
Contributor Author

That's a fair comment - I agree. Had a rethink after publishing v4, should have gone more explicit, i.e.

spider_file: 'spider_paths.yml'

Would be less stateful and easier to maintain. PRs are welcome!

@ajoah
Copy link
Contributor

ajoah commented Dec 7, 2016

Hi @ChrisBAshton

I'm trying to use this new command but i may have missed something because if i use this config exemple : http://bbc-news.github.io/wraith/configs.html#Spiderconfig ,

I get :

wraith spider configs/demo.yml
Config validated. No serious issues found.
ERROR: unable to find referenced imported config "paths_generated_by_spider.yml"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants