ubc-cirrus-lab · efeberkeevci · Apr 22, 2022 · Apr 15, 2022 · Apr 22, 2022 · Apr 22, 2022
diff --git a/README.md b/README.md
@@ -57,4 +57,7 @@ Graphs for error and prediction vs epoch can be found corresponding folders in `
 To invoke only with the configurations defined in `config.json`, use `-i` flag
 ```bash
 spot <function_name> -i
-```
+```
+
+### SPOT UML Class Diagram
+![SPOT UML Class Diagram](/spot/visualize/SPOT_UML_Class_Diagram.jpeg)
diff --git a/spot/Spot.py b/spot/Spot.py
@@ -86,7 +86,9 @@ def invoke(self):
         self.log_prop_waiter.wait_by_count(start, self.function_invocator.invoke_cnt)
 
     def collect_data(self):
-        # retrieve logs
+        # retrieve latest config, logs, pricing scheme
+        self.config_retriever.get_latest_config()
+        self.price_retriever.fetch_current_pricing()
         self.last_log_timestamp = self.log_retriever.get_logs()
 
     def train_model(self):

diff --git a/spot/configs/README.md b/spot/configs/README.md
@@ -1,9 +1,8 @@
 ## AWS Config Retriever
-* Retrieves current configs of the given function 
-* Populates them in the MongoDB database
+This module fetches the current serverless function configuration through Boto3. Then, if the configuration hasn't been updated since the last saved config, configurations are saved to the local database with a timestamp so they can be correlated with logs.
 
-### Example Log Output:
-```
+### Example Config Output:
+```json
 {
         "_id" : ObjectId("61b3beaee841f6f62091b06d"),
         "FunctionName" : "AWSHelloWorld",

diff --git a/spot/configs/aws_config_retriever.py b/spot/configs/aws_config_retriever.py
@@ -1,6 +1,4 @@
-import os
 import boto3
-import time as time
 from spot.db.db import DBClient
 import datetime
 

diff --git a/spot/db/README.md b/spot/db/README.md
@@ -0,0 +1,3 @@
+## DB
+
+The database client class(DBClient) adds a layer of abstraction for different components of SPOT to interact with the local MongoDB database. The module utilizes the PyMongo package to interact with the mongoDB client through a well-defined API.  It saves, modifies, reads various data from/to the local database.
diff --git a/spot/db/db.py b/spot/db/db.py
@@ -50,19 +50,26 @@ def add_new_config_if_changed(self, function_name, collection_name, document):
         function_db = self.client[function_name]
         collection = function_db[collection_name]
 
-        latest_config = collection.find_one(sort=[("_id", pymongo.DESCENDING)])
-        if latest_config:
-            del latest_config["_id"]
-            del latest_config["LastModified"]
-            del latest_config["RevisionId"]
-            del latest_config["LastModifiedInMs"]
-
-        test = document.copy()
-        del test["LastModified"]
-        del test["RevisionId"]
-
-        if not latest_config == test:
+        latest_saved_config = collection.find_one(sort=[("_id", pymongo.DESCENDING)])
+
+        # Delete unique identifier fields to be able to configure current and most recent config
+        if latest_saved_config:
+            del latest_saved_config["_id"]
+            del latest_saved_config["LastModified"]
+            del latest_saved_config["RevisionId"]
+            del latest_saved_config["LastModifiedInMs"]
+            del latest_saved_config["ResponseMetadata"]
+
+        current_config = document.copy()
+        del current_config["LastModified"]
+        del current_config["LastModifiedInMs"]
+        del current_config["RevisionId"]
+        del current_config["ResponseMetadata"]
+
+        if not latest_saved_config == current_config:
             collection.insert_one(document)
+        elif not current_config.keys() == latest_saved_config.keys():
+            print("Warning: AWS might have changed configuration parameters")
 
     def execute_query(
         self, function_name, collection_name, select_fields, display_fields

diff --git a/spot/invocation/README.md b/spot/invocation/README.md
@@ -1,4 +1,6 @@
 # AWS Function Invocator
+Automatic invocator is the tool to trigger benchmark functions to produce logs for model fitting and error calculation and waits for the data to be available on the cloud for further retrieval. It is based on the source code of an open-source tool, FaaSProfiler by Princeton University. 
+
 Adapted form [FaaSProfiler](https://github.com/PrincetonUniversity/faas-profiler).
 ## Initiliaztion and Invocation
 The invocator takes a user-defined workload JSON file on initialization and asynchronously invokes all the functions instances in it on `invoke_all()`. The memory size limit is 128MB by default and it can be optionally specified when calling `invoke_all()`.
@@ -7,39 +9,42 @@ ivk = AWSFuncctionInvocator(<path/to/workload>)
 ivk.invoke_all(<memory_size>)
 ```
 
-## Workload Structure
-Below is an example of workload input to the invocator
+The software module takes a JSON file with serverless function metadata as input and sends multiple asynchronous requests to trigger the functions with Boto3 based on a configuration file. Serverless function invocations are observed to follow statistical distribution patterns. In order to emulate the aforementioned statistical distribution pattern with our function invocator, our module requires four parameters
+
+### Workload File Structure
+1. **Invocation Pattern:** This parameter controls the distribution of time intervals between invocations of a function. It implements uniform and Poisson distributions, which are the most common invocation patterns for serverless functions based on information from our client. In addition, a user can replay an invocation pattern from the actual usage of their serverless function. 
+Whether a function is loaded in the CPU(warm start) can affect the runtime and cost greatly. Letting users specify the invocation distribution that most closely matches their function’s workload minimizes the error in profiling and provides more accurate data for optimization. 
+
+2. **Invocation Rate:** The invocation rate specifies the average frequency of invocation within the specified distribution. This allows users to control the volume and intensity of invocation to accurately resemble the real-life invocation distribution.
+
+3. **Function Input(Payload):** The input group to be used in a cycle of invocation sequence is stored in a separate JSON file and the path to this file is specified in the invocation configuration file. Having this separation enables quick modification of the input group, which can save development and deployment time. 
+
+4. **Invocation Duration and Active Duration:** Users can define the duration of an invocation run (e.g. 15s) and the active window for a function within that run (e.g. 10~15s of the invocation duration). Such design gives users the freedom to stack patterns together by having multiple instances running in the same invocation duration but under different active windows. The two durations can be set to be the same if no such customization is desired.
+
+### Log Propagation Waiter
+There is a non-negligible delay between function invocation and the logs to be available for retrieval. To avoid inconsistency between function invocation and log retrieval, the invocator waits for the logs to propagate on AWS CloudWatch with a waiter module after each invocation run. The waiter takes a start time and the expected count of new logs then checks how many logs are available after the start time until the number is reached or timeout.
+
+## Example Workload File
 ``` json
-{                                                          
-    "test_name": "test",
-    "test_duration_in_seconds": 15,
-    "random_seed": 100,
+{
     "blocking_cli": false,
-    "instances":{
-        "instance1":{
-            "application": "chameleon",
+    "instances": {
+        "instance1": {
+            "activity_window": [
+                5,
+                10
+            ],
+            "application": "pyaes",
             "distribution": "Poisson",
+            "host": "x8siu0es68.execute-api.us-east-2.amazonaws.com",
+            "payload": "payload.json",
             "rate": 5,
-            "activity_window": [5, 10],
-            "payload": "poisson.json",
-            "host": "ng11cbhnw7.execute-api.us-west-1.zonaws.com",
-            "stage":"default",
-            "resource":"chameleon"
-        },
-        "instance2":{
-            "application": "chameleon",
-            "interarrivals_list": [5,0.13,0.15,0.8,0.1,0.13,0.13,0.1,0.4],
-            "host": "ng11cbhnw7.execute-api.us-west-1.zonaws.com",
-            "stage":"default",
-            "resource":"chameleon"
-        },
-    }
+            "resource": "/pyaes?format=json",
+            "stage": "default"
+        }
+    },
+    "random_seed": 100,
+    "test_duration_in_seconds": 15,
+    "test_name": "IntegrationTest1"
 }
-```
-
-`"application"` is the key field which determines which function will be invoked. It should be a full ARN, partial ARN, or the name of the function.
-
-To use a customized invocation interval, define `interarrival_list` instead of `distribution` as in `instance2`. `interarrival_list` has a higher priority than distribution so if both are specified the customized interarrival time will be used for invocation.
-
-`payload` is the path to a JSON file with cloud function inputs. Can skip if the function does not require any input.
-
+```
diff --git a/spot/logs/README.md b/spot/logs/README.md
@@ -1,9 +1,12 @@
 ## AWS Log Retriever
-* Retrieves logs of the given function 
-* Populates them in the MongoDB database
+The logs are retrieved using Boto3. The log retrieval was designed to work independently of the invocation of the function. Therefore, the function can continue to run and the logs are only retrieved when necessary for optimizing the configuration of the serverless function. When the retriever is used, it gathers all logs since the last retrieval, parses them so that their data is indexed for easy searching, and stores them in the database.
+
+A list of invocation IDs that were triggered using the automatic function invocator is also stored in the database and matched with the logs as they are retrieved from AWS. This confirms that all invocated functions have matching logs and the dataset is complete.
+Logs for a particular function can be retrieved through the CLI tool using the ‘--fetch’ flag.
+
 
 ### Example Log Output:
-```
+```json
 {
         "_id" : ObjectId("61b3b35cb537d4c93cb3c8b7"),
         "events" : [

diff --git a/spot/prices/README.md b/spot/prices/README.md
@@ -0,0 +1,12 @@
+## AWS Price Retriever
+The pricing scheme for serverless functions is retrieved using Infracost's Cloud Pricing API which aggregates pricing information from major cloud vendors. The API requires specification of cloud vendor, service, product family and region to fetch the pricing scheme with the indicated parameters. In our tool, we are interested in the pricing scheme of the AWS Lambda serverless functions, which differs by region and is calculated in direct relation to the invocation duration per MB & fixed per request price. Upon the successful return of the request, the price retriever parses the response to save Request Price, Duration Price and the Region to the local database. This ensures our model always has up-to-date prices. Furthermore, it also ensures that the historical logs are associated with the respective pricing scheme that generated the log.
+
+### Example Pricing Scheme:
+```
+{
+  "request_price": 2e-7,
+  "duration_price": 0.0000166667,
+  "timestamp": 163980957281,
+  "Region": "us-east-1"
+}
+```
diff --git a/spot/recommendation_engine/README.md b/spot/recommendation_engine/README.md
@@ -0,0 +1,2 @@
+## Recommendation Engine
+The recommendation engine predicts costs based on the generated statistical model for various configurations. Furthermore, it selects and recommends the lowest-cost generating configuration. Lastly, upon user request, update serverless function configuration on AWS with the recommended configuration.
diff --git a/spot/visualize/README.md b/spot/visualize/README.md
@@ -0,0 +1,5 @@
+## Visualization Module
+### Error vs Epoch Graph
+The module creates error vs epoch graph for the specified serverless function. It creates a visual representation to observe the error rate changes through increasing cycles of fitting. 
+### Recommended Memory Size vs Epoch Graph
+The module creates recommended memory size vs epoch graph for the specified serverless function. It creates a visual representation to observe the recommended config memory size through increasing cycles of fitting. 
diff --git a/spot/visualize/SPOT_UML_Class_Diagram.jpeg b/spot/visualize/SPOT_UML_Class_Diagram.jpeg
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## DB

		The database client class(DBClient) adds a layer of abstraction for different components of SPOT to interact with the local MongoDB database. The module utilizes the PyMongo package to interact with the mongoDB client through a well-defined API. It saves, modifies, reads various data from/to the local database.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		## Recommendation Engine
		The recommendation engine predicts costs based on the generated statistical model for various configurations. Furthermore, it selects and recommends the lowest-cost generating configuration. Lastly, upon user request, update serverless function configuration on AWS with the recommended configuration.