Skip to content

Latest commit

 

History

History
60 lines (50 loc) · 2.21 KB

SPIDER_NAMING.md

File metadata and controls

60 lines (50 loc) · 2.21 KB

Naming your spider

With so many spider files it is important that we maintain a high level of naming consistency. At the time of writing there is a single directory containing all of our spiders.

The file and the spider within it should share the same name. The name should reflect the brand or company name that the spider is for. Lower case characters must be used. If your Python file is great_name.py then your spider should look something like:

import scrapy

class GreatNameSpider(scrapy.Spider):
    name = "great_name"

The camel case approach illustrated for the class name is further recommended. If you know the company / brand that you are spidering only has outlets in one country then it is helpful to suffix your name with the ISO alpha-2 code for that country. For example, if great_name only had outlets in South Africa, then name the spider file great_name_za.py. Internally, it should look something like:

import scrapy

class GreatNameZASpider(scrapy.Spider):
    name = "great_name_za"

If a brand / company is present in multiple territories, and the spider is able to access all of these using a common approach / API, then do not qualify the spider name with a country code. An example of this would be the Decathlon spider.

Sometimes things get a little complex. There is a primary mcdonalds.py spider which is able to process many major countries with a common API. There are country specific spiders, e.g. mcdonalds_pt.py, which cover a single territory. There are also some McDonald's spiders which cover a small number of countries with the same approach, e.g. mcdonalds_baltics.py.

It is quite important that you only add a two character suffix to the spider name if you are sure that all POIs generated by the spider are in the same country. The reason for this is that ATP pipeline code will add this country suffix as the country code to scraped POIs if it has not been set by the spider itself.