Lambda Selenium Starter

A simple starter for headless chrome + selenium webdriver in AWS Lambda using Python.

Lambda Selenium Starter provides a framework for seamless development & deployment of web scrapers, for any webpage, to AWS Lambda. To dive right in, check out Quick Start. Otherwise, visit the Wiki or blog post for a more detailed guide.

An example can be found here or by viewing some of the example projects in examples.

How does it work?

This starter was primarily developed in inspiration of the 21 Buttons guide/repo.

Technologies used are:

How is this unique?

Using this starter, you can develop, test, and write your selenium web scrapers nearly identically to how you would do so without AWS Lambda functionality. This starter makes it possible to develop your selenium web scrapers without learning much about the technologies involved.

Requirements

Install docker and dependencies:

make fetch-dependencies
Installing Docker
Installing Docker compose

Quick Start

Clone this repo
Look at the current scraper in lambda_function.py: it navigates to Google and prints some messages.
Modify lambda_function.py to perform your desired actions (using selenium as you normally would). There are two importance differences: make sure your functions accept the driver instance if you need to perform selenium related actions. Make sure your main function call comes from lambda_handler.
Add any additional dependencies to requirements.txt

Testing locally

Test your web scraper locally with: make docker-run. It's highly suggested you test locally before packaging it up to AWS - it's much easier to debug locally! Check out the possible other commands in the Makefile

Building and uploading the distributable package

Once you're ready to upload to AWS, do the following:

make build-lambda-package
Upload the build.zip resulting file to your AWS Lambda function (typically this will involve using S3)
Set Lambda environment variables (same values as in docker-compose.yml)
- PYTHONPATH=/var/task/src:/var/task/lib
- PATH=/var/task/bin
Adjust lambda function parameters to match your necessities, for the given example:
- Timeout: +10 seconds
- Memory: + 250MB
Invoke your function using the AWS CLI aws lambda invoke --function-name YOURFUNCTIONNAME out --log-type Tail

Example

To view what modifications a slightly larger web scraper might involve, check out this gist

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lambda Selenium Starter

How does it work?

How is this unique?

Requirements

Quick Start

Testing locally

Building and uploading the distributable package

Example

Shouts to

About

Releases

Packages

Contributors 2

Languages

License

noahsb/lambda-selenium-starter

Folders and files

Latest commit

History

Repository files navigation

Lambda Selenium Starter

How does it work?

How is this unique?

Requirements

Quick Start

Testing locally

Building and uploading the distributable package

Example

Shouts to

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages