A simple starter for headless chrome + selenium webdriver in AWS Lambda using Python.
Lambda Selenium Starter provides a framework for seamless development & deployment of web scrapers, for any webpage, to AWS Lambda. To dive right in, check out Quick Start. Otherwise, visit the Wiki or blog post for a more detailed guide.
An example can be found here or by viewing some of the example projects in examples
.
This starter was primarily developed in inspiration of the 21 Buttons guide/repo.
Technologies used are:
- Python 3.6
- Selenium
- Chrome driver
- Small chromium binary
- Docker
Using this starter, you can develop, test, and write your selenium web scrapers nearly identically to how you would do so without AWS Lambda functionality. This starter makes it possible to develop your selenium web scrapers without learning much about the technologies involved.
Install docker and dependencies:
make fetch-dependencies
- Installing Docker
- Installing Docker compose
-
Clone this repo
-
Look at the current scraper in
lambda_function.py
: it navigates to Google and prints some messages. -
Modify
lambda_function.py
to perform your desired actions (using selenium as you normally would). There are two importance differences: make sure your functions accept the driver instance if you need to perform selenium related actions. Make sure your main function call comes fromlambda_handler
. -
Add any additional dependencies to
requirements.txt
Test your web scraper locally with: make docker-run
. It's highly suggested you test locally before packaging it up to AWS - it's much easier to debug locally! Check out the possible other commands in the Makefile
Once you're ready to upload to AWS, do the following:
make build-lambda-package
- Upload the
build.zip
resulting file to your AWS Lambda function (typically this will involve using S3) - Set Lambda environment variables (same values as in docker-compose.yml)
PYTHONPATH=/var/task/src:/var/task/lib
PATH=/var/task/bin
- Adjust lambda function parameters to match your necessities, for the given example:
- Timeout: +10 seconds
- Memory: + 250MB
- Invoke your function using the AWS CLI
aws lambda invoke --function-name YOURFUNCTIONNAME out --log-type Tail
To view what modifications a slightly larger web scraper might involve, check out this gist