scrapy crawl -o LargusNew.csv LargusNew -s JOBDIR=crawls/LargusNew 2020-09-25 15:57:26 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: Largus) 2020-09-25 15:57:26 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.2 (default, Jul 16 2020, 14:00:26) - [GCC 9.3.0], pyOpenSSL 19.0.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.8, Platform Linux-5.4.0-48-generic-x86_64-with-glibc2.29 2020-09-25 15:57:26 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 180, 'BOT_NAME': 'Largus', 'CONCURRENT_REQUESTS': 2, 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 180, 'HTTPCACHE_ALWAYS_STORE': True, 'HTTPCACHE_ENABLED': True, 'HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS': ['no-store', 'no-cache', 'must-revalidate', 'max-age', 'max-stale', 'private', 'proxy-revalidate', 'only-if-cached'], 'HTTPCACHE_STORAGE': 'Largus.MySQLStorage.MySQLStorage', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'Largus.spiders', 'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408, 104], 'SPIDER_MODULES': ['Largus.spiders'], 'TELNETCONSOLE_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) ' 'Gecko/20100101 Firefox/80.0'} 2020-09-25 15:57:26 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.spiderstate.SpiderState'] __init__(): 2020-09-25 15:57:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'rotating_proxies.middlewares.RotatingProxyMiddleware', 'rotating_proxies.middlewares.BanDetectionMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware'] 2020-09-25 15:57:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2020-09-25 15:57:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2020-09-25 15:57:26 [scrapy.core.engine] INFO: Spider opened 2020-09-25 15:57:26 [scrapy.core.scheduler] INFO: Resuming crawl (40951 requests scheduled) 2020-09-25 15:57:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2020-09-25 15:57:26 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 0, unchecked: 2, reanimated: 0, mean backoff time: 0s) store_response(status, url): 200 https://www.largus.fr/fiche-technique/Audi/A5+Cabriolet/Ii/2020/Cabriolet+2+Portes/20+Tfsi+190+Design+Luxe+Stro-1680763.html <== 1st spider starts 2020-09-25 15:57:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 15:58:26 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 1 items (at 1 items/min) 2020-09-25 15:58:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 15:58:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 15:59:26 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 1 items (at 0 items/min) 2020-09-25 15:59:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 15:59:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:00:26 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 1 items (at 0 items/min) 2020-09-25 16:00:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:00:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) store_response(status, url): 200 https://www.largus.fr/fiche-technique/Porsche/718+Boxster/I/2020/Cabriolet+2+Portes/20+300ch+Pdk-1552648.html <== 2nd spider starts 180 seconds after ! 2020-09-25 16:01:26 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 1 pages/min), scraped 2 items (at 1 items/min) 2020-09-25 16:01:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) ^C2020-09-25 16:01:39 [scrapy.crawler] INFO: Received SIGINT, shutting down gracefully. Send again to force 2020-09-25 16:01:39 [scrapy.core.engine] INFO: Closing spider (shutdown) 2020-09-25 16:01:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:02:26 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 2 items (at 0 items/min) 2020-09-25 16:02:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:02:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:03:26 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 2 items (at 0 items/min) 2020-09-25 16:03:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:03:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:04:26 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 2 items (at 0 items/min) 2020-09-25 16:04:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:04:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) store_response(status, url): 200 https://www.largus.fr/fiche-technique/Mercedes-Benz/Classe+C+Coupe/Ii+C205/2019/Coupe+2+Portes/300+245ch+Fascination+9g-Tro-1638125.html 2020-09-25 16:05:26 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 1 pages/min), scraped 3 items (at 1 items/min) 2020-09-25 16:05:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:05:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:06:26 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 0 pages/min), scraped 3 items (at 0 items/min) 2020-09-25 16:06:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:06:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:07:26 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 0 pages/min), scraped 3 items (at 0 items/min) 2020-09-25 16:07:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:07:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:08:26 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 0 pages/min), scraped 3 items (at 0 items/min) 2020-09-25 16:08:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:08:56 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) 2020-09-25 16:09:26 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 0 pages/min), scraped 3 items (at 0 items/min) 2020-09-25 16:09:26 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 0, unchecked: 1, reanimated: 0, mean backoff time: 0s) store_response(status, url): 200 https://www.largus.fr/fiche-technique/Audi/A5+Cabriolet/Ii/2019/Cabriolet+2+Portes/45+Tfsi+245+Des+Qto+Stro+E6dt+142g-2132723.html 2020-09-25 16:09:35 [scrapy.extensions.feedexport] INFO: Stored csv feed (4 items) in: LargusNew.csv 2020-09-25 16:09:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1977, 'downloader/request_count': 4, 'downloader/request_method_count/GET': 4, 'downloader/response_bytes': 109335, 'downloader/response_count': 4, 'downloader/response_status_count/200': 4, 'dupefilter/filtered': 61, 'elapsed_time_seconds': 728.648941, 'finish_reason': 'shutdown', 'finish_time': datetime.datetime(2020, 9, 25, 14, 9, 35, 463291), 'httpcache/firsthand': 4, 'httpcache/miss': 4, 'httpcache/store': 4, 'item_scraped_count': 4, 'log_count/INFO': 48, 'memusage/max': 80711680, 'memusage/startup': 71647232, 'proxies/good': 1, 'proxies/mean_backoff': 0.0, 'proxies/reanimated': 0, 'proxies/unchecked': 1, 'request_depth_max': 816, 'response_received_count': 4, 'scheduler/dequeued': 4, 'scheduler/dequeued/disk': 4, 'scheduler/enqueued': 3, 'scheduler/enqueued/disk': 3, 'start_time': datetime.datetime(2020, 9, 25, 13, 57, 26, 814350)} 2020-09-25 16:09:35 [scrapy.core.engine] INFO: Spider closed (shutdown)