Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config file change detection fix #78

Merged
merged 4 commits into from
Apr 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,7 @@ Graphs for error and prediction vs epoch can be found corresponding folders in `
To invoke only with the configurations defined in `config.json`, use `-i` flag
```bash
spot <function_name> -i
```
```

### SPOT UML Class Diagram
![SPOT UML Class Diagram](/spot/visualize/SPOT_UML_Class_Diagram.jpeg)
4 changes: 3 additions & 1 deletion spot/Spot.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,9 @@ def invoke(self):
self.log_prop_waiter.wait_by_count(start, self.function_invocator.invoke_cnt)

def collect_data(self):
# retrieve logs
# retrieve latest config, logs, pricing scheme
self.config_retriever.get_latest_config()
self.price_retriever.fetch_current_pricing()
self.last_log_timestamp = self.log_retriever.get_logs()

def train_model(self):
Expand Down
7 changes: 3 additions & 4 deletions spot/configs/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
## AWS Config Retriever
* Retrieves current configs of the given function
* Populates them in the MongoDB database
This module fetches the current serverless function configuration through Boto3. Then, if the configuration hasn't been updated since the last saved config, configurations are saved to the local database with a timestamp so they can be correlated with logs.

### Example Log Output:
```
### Example Config Output:
```json
{
"_id" : ObjectId("61b3beaee841f6f62091b06d"),
"FunctionName" : "AWSHelloWorld",
Expand Down
2 changes: 0 additions & 2 deletions spot/configs/aws_config_retriever.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
import os
import boto3
import time as time
from spot.db.db import DBClient
import datetime

Expand Down
3 changes: 3 additions & 0 deletions spot/db/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## DB

The database client class(DBClient) adds a layer of abstraction for different components of SPOT to interact with the local MongoDB database. The module utilizes the PyMongo package to interact with the mongoDB client through a well-defined API. It saves, modifies, reads various data from/to the local database.
31 changes: 19 additions & 12 deletions spot/db/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,19 +50,26 @@ def add_new_config_if_changed(self, function_name, collection_name, document):
function_db = self.client[function_name]
collection = function_db[collection_name]

latest_config = collection.find_one(sort=[("_id", pymongo.DESCENDING)])
if latest_config:
del latest_config["_id"]
del latest_config["LastModified"]
del latest_config["RevisionId"]
del latest_config["LastModifiedInMs"]

test = document.copy()
del test["LastModified"]
del test["RevisionId"]

if not latest_config == test:
latest_saved_config = collection.find_one(sort=[("_id", pymongo.DESCENDING)])

# Delete unique identifier fields to be able to configure current and most recent config
if latest_saved_config:
del latest_saved_config["_id"]
del latest_saved_config["LastModified"]
del latest_saved_config["RevisionId"]
del latest_saved_config["LastModifiedInMs"]
del latest_saved_config["ResponseMetadata"]

current_config = document.copy()
del current_config["LastModified"]
del current_config["LastModifiedInMs"]
del current_config["RevisionId"]
del current_config["ResponseMetadata"]

if not latest_saved_config == current_config:
collection.insert_one(document)
elif not current_config.keys() == latest_saved_config.keys():
print("Warning: AWS might have changed configuration parameters")

def execute_query(
self, function_name, collection_name, select_fields, display_fields
Expand Down
67 changes: 36 additions & 31 deletions spot/invocation/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# AWS Function Invocator
Automatic invocator is the tool to trigger benchmark functions to produce logs for model fitting and error calculation and waits for the data to be available on the cloud for further retrieval. It is based on the source code of an open-source tool, FaaSProfiler by Princeton University.

Adapted form [FaaSProfiler](https://github.com/PrincetonUniversity/faas-profiler).
## Initiliaztion and Invocation
The invocator takes a user-defined workload JSON file on initialization and asynchronously invokes all the functions instances in it on `invoke_all()`. The memory size limit is 128MB by default and it can be optionally specified when calling `invoke_all()`.
Expand All @@ -7,39 +9,42 @@ ivk = AWSFuncctionInvocator(<path/to/workload>)
ivk.invoke_all(<memory_size>)
```

## Workload Structure
Below is an example of workload input to the invocator
The software module takes a JSON file with serverless function metadata as input and sends multiple asynchronous requests to trigger the functions with Boto3 based on a configuration file. Serverless function invocations are observed to follow statistical distribution patterns. In order to emulate the aforementioned statistical distribution pattern with our function invocator, our module requires four parameters

### Workload File Structure
1. **Invocation Pattern:** This parameter controls the distribution of time intervals between invocations of a function. It implements uniform and Poisson distributions, which are the most common invocation patterns for serverless functions based on information from our client. In addition, a user can replay an invocation pattern from the actual usage of their serverless function.
Whether a function is loaded in the CPU(warm start) can affect the runtime and cost greatly. Letting users specify the invocation distribution that most closely matches their function’s workload minimizes the error in profiling and provides more accurate data for optimization.

2. **Invocation Rate:** The invocation rate specifies the average frequency of invocation within the specified distribution. This allows users to control the volume and intensity of invocation to accurately resemble the real-life invocation distribution.

3. **Function Input(Payload):** The input group to be used in a cycle of invocation sequence is stored in a separate JSON file and the path to this file is specified in the invocation configuration file. Having this separation enables quick modification of the input group, which can save development and deployment time.

4. **Invocation Duration and Active Duration:** Users can define the duration of an invocation run (e.g. 15s) and the active window for a function within that run (e.g. 10~15s of the invocation duration). Such design gives users the freedom to stack patterns together by having multiple instances running in the same invocation duration but under different active windows. The two durations can be set to be the same if no such customization is desired.

### Log Propagation Waiter
There is a non-negligible delay between function invocation and the logs to be available for retrieval. To avoid inconsistency between function invocation and log retrieval, the invocator waits for the logs to propagate on AWS CloudWatch with a waiter module after each invocation run. The waiter takes a start time and the expected count of new logs then checks how many logs are available after the start time until the number is reached or timeout.

## Example Workload File
``` json
{
"test_name": "test",
"test_duration_in_seconds": 15,
"random_seed": 100,
{
"blocking_cli": false,
"instances":{
"instance1":{
"application": "chameleon",
"instances": {
"instance1": {
"activity_window": [
5,
10
],
"application": "pyaes",
"distribution": "Poisson",
"host": "x8siu0es68.execute-api.us-east-2.amazonaws.com",
"payload": "payload.json",
"rate": 5,
"activity_window": [5, 10],
"payload": "poisson.json",
"host": "ng11cbhnw7.execute-api.us-west-1.zonaws.com",
"stage":"default",
"resource":"chameleon"
},
"instance2":{
"application": "chameleon",
"interarrivals_list": [5,0.13,0.15,0.8,0.1,0.13,0.13,0.1,0.4],
"host": "ng11cbhnw7.execute-api.us-west-1.zonaws.com",
"stage":"default",
"resource":"chameleon"
},
}
"resource": "/pyaes?format=json",
"stage": "default"
}
},
"random_seed": 100,
"test_duration_in_seconds": 15,
"test_name": "IntegrationTest1"
}
```

`"application"` is the key field which determines which function will be invoked. It should be a full ARN, partial ARN, or the name of the function.

To use a customized invocation interval, define `interarrival_list` instead of `distribution` as in `instance2`. `interarrival_list` has a higher priority than distribution so if both are specified the customized interarrival time will be used for invocation.

`payload` is the path to a JSON file with cloud function inputs. Can skip if the function does not require any input.

```
9 changes: 6 additions & 3 deletions spot/logs/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
## AWS Log Retriever
* Retrieves logs of the given function
* Populates them in the MongoDB database
The logs are retrieved using Boto3. The log retrieval was designed to work independently of the invocation of the function. Therefore, the function can continue to run and the logs are only retrieved when necessary for optimizing the configuration of the serverless function. When the retriever is used, it gathers all logs since the last retrieval, parses them so that their data is indexed for easy searching, and stores them in the database.

A list of invocation IDs that were triggered using the automatic function invocator is also stored in the database and matched with the logs as they are retrieved from AWS. This confirms that all invocated functions have matching logs and the dataset is complete.
Logs for a particular function can be retrieved through the CLI tool using the ‘--fetch’ flag.


### Example Log Output:
```
```json
{
"_id" : ObjectId("61b3b35cb537d4c93cb3c8b7"),
"events" : [
Expand Down
12 changes: 12 additions & 0 deletions spot/prices/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## AWS Price Retriever
The pricing scheme for serverless functions is retrieved using Infracost's Cloud Pricing API which aggregates pricing information from major cloud vendors. The API requires specification of cloud vendor, service, product family and region to fetch the pricing scheme with the indicated parameters. In our tool, we are interested in the pricing scheme of the AWS Lambda serverless functions, which differs by region and is calculated in direct relation to the invocation duration per MB & fixed per request price. Upon the successful return of the request, the price retriever parses the response to save Request Price, Duration Price and the Region to the local database. This ensures our model always has up-to-date prices. Furthermore, it also ensures that the historical logs are associated with the respective pricing scheme that generated the log.

### Example Pricing Scheme:
```
{
"request_price": 2e-7,
"duration_price": 0.0000166667,
"timestamp": 163980957281,
"Region": "us-east-1"
}
```
2 changes: 2 additions & 0 deletions spot/recommendation_engine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
## Recommendation Engine
The recommendation engine predicts costs based on the generated statistical model for various configurations. Furthermore, it selects and recommends the lowest-cost generating configuration. Lastly, upon user request, update serverless function configuration on AWS with the recommended configuration.
5 changes: 5 additions & 0 deletions spot/visualize/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Visualization Module
### Error vs Epoch Graph
The module creates error vs epoch graph for the specified serverless function. It creates a visual representation to observe the error rate changes through increasing cycles of fitting.
### Recommended Memory Size vs Epoch Graph
The module creates recommended memory size vs epoch graph for the specified serverless function. It creates a visual representation to observe the recommended config memory size through increasing cycles of fitting.
Binary file added spot/visualize/SPOT_UML_Class_Diagram.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.