'Imports' feature and better spider support
Introduces (via #487):
- 'Imports' functionality (#486)
-
wraith spider [config_name]
command (#488) -
wraith info
command (#484)
Spidering and Imports
We have a lot of open 'Spider mode' issues. Many of these arise from the fact that the Wraith spidering logic is quite complex and difficult to test. e.g. Wraith has to maintain a lot of internal state, triggering spidering automatically when certain config properties are missing, but only triggering the spidering if it hasn't been done in the last x-config-option days, etc.
We can make Wraith much simpler under the hood by giving spidering its own dedicated spider
command, which the user must choose to run manually, as regularly as they choose. A new 'Imports' feature allows users to import configs into one another.
Paths determined through the spider
command are stored to a file as YAML, instead of in a .txt file, which reduces complexity further by storing paths in the same way as non-spider use of Wraith. It also removes the Nokogiri dependency, which will speed up Wraith setup times slightly.
The 'Imports' feature has lots of potential uses beyond just supporting spidering. For example, you may have a common Wraith config defining browser engine, screen sizes to capture, colour of diff image, etc - and then you can have multiple different Wraith configs for each of your sites, all of which import the base config and stop you from having to duplicate all of that information.
Proposed spider workflow
Say we have a simplified config:
imports: "spider_configs.yml"
domains:
test: http://www.test.example.com
live: http://www.test.example.com
wraith spider test.yml
=> spiders the sites, and saves the paths tospider_configs.yml
.wraith capture test.yml
=> thespider_configs.yml
paths are automatically imported into thetest.yml
, and Wraith continues as if the paths were specified manually.