Just an exercise to fetch site data using threads and saving it on DB
- A free aws account
- Thats all :D
- Open aws cloudshell page (make sure that you are at the same region of the lambda funcion)
- Install desired python version (I am using 3.11 for this one)
sudo yum install python3.11
- Install desired package
python3.11 -m pip install requests -t python/lib/python3.11/site-packages
Note
Some packages does not work when installed using this way, throwing [ERROR] Runtime.ImportModuleError: Unable to import module
during execution. Just install it forcing the expected parameters for the lambda machine
pip install --platform manylinux2014_x86_64 --target=python --implementation cp --python-version 3.11 --only-binary=:all: --upgrade lxml
- Zip it
zip -r requests_layer.zip python
- Publish layer
aws lambda publish-layer-version --layer-name requests --zip-file fileb://requests_layer.zip --compatible-runtimes python3.11
- Access DynamoDB page
- Click on "Create table" button
- Input table name. (We are using
BairesDevJobs
for this exercise) - Input partition key, aka primary key. (We are using
jobID
for this exercise) - Click on "Create table" button
- Access AWS Lambda page
- Click on "Create function" button
- Make sure to select a python version that is available at cloudshell too. (3.11 for this exercise)
- Make sure to enable function URL, in additional settings section.
- Select auth type = NONE. Just dont share this url and you will be safe.
- In the last section of your lambda function code tab.
- Click on "Add a ayer"
- Select custom layer
- Choose the desired layer.
- Repeat it untill all required layers was added. (we need requests and lxml layers for this exercises)
Important
if a layer is not showing, review the How to create lambda Layers section
- Open configuration tab on lambda function page.
- Select Permissions sub tab.
- Click on IAM role link (It is just bellow role name)
- Click on add a permission and select a policy that have getItem and UpdateItem for dynamoDB.
- If you have any policy with it, just create a new one. How to create a permission policy section.
- Click on "Add permissions" button.
- Access IAM policies page
- Click on "Create Policy" button.
- Select the allowed actions or configure the json directly.
- For this example I used this old json, but not all actions are required.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"dynamodb:BatchGetItem",
"dynamodb:BatchWriteItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:GetItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:UpdateItem"
],
"Resource": "*"
}
]
}
- Click on "Next" button
- Input the policy name
- Click on "Create policy" button
- Just copy
extractor.py
,FetchThread.py
andlambda_function.py
files to lambda function.
- Open function url using your prefered browser and/or RSS reader. (You can find it at the right side of your Lambda function diagram)
- To filter a job just pass the search in url path.
- Example:
https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.lambda-url.eu-central-1.on.aws/qa
will list only QA jobs.
- Example:
Tip
You can enable debug by passing it as query string parameter
Example: https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.lambda-url.eu-central-1.on.aws/qa?debug