Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AppMesh metric integration test #247

Merged
merged 1 commit into from
Mar 31, 2021

Conversation

bjrara
Copy link
Member

@bjrara bjrara commented Mar 23, 2021

Why do we need it?
Add container insight integration test of collecting AppMesh metrics on EKS

@bjrara bjrara force-pushed the terraform branch 2 times, most recently from 863d878 to 828a85e Compare March 24, 2021 05:21
@bjrara bjrara changed the title [WIP] Add AppMesh integration test Add AppMesh integration test Mar 24, 2021
Copy link
Member

@pingleig pingleig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see aws_appmesh_mesh so you created the mesh without terraform? You might want to add the instruction here.

terraform/templates/local/docker_compose_from_source.tpl Outdated Show resolved Hide resolved
@bjrara
Copy link
Member Author

bjrara commented Mar 24, 2021

I didn't see aws_appmesh_mesh so you created the mesh without terraform? You might want to add the instruction here.

I don't create appmesh using terraform. All the test resources are created using a single yaml.

apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
name: ${MESH_NAME}
spec:
namespaceSelector:
matchLabels:
mesh: ${MESH_NAME}

Copy link
Contributor

@wyTrivail wyTrivail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some small nits, thanks

terraform/eks-cloudwatch/app_mesh.tf Outdated Show resolved Hide resolved
terraform/eks-cloudwatch/app_mesh.tf Outdated Show resolved Hide resolved
terraform/eks-cloudwatch/app_mesh.tf Outdated Show resolved Hide resolved
terraform/eks-cloudwatch/app_mesh.tf Outdated Show resolved Hide resolved
terraform/eks-cloudwatch/app_mesh.tf Outdated Show resolved Hide resolved
import lombok.Data;

@Data
public class MetricContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this validator aims to support multiple component, not just cloudwatch, maybe giving it a concrete name would be better?

Copy link
Member Author

@bjrara bjrara Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to CloudWatchContext

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this context include "log" and "metric"? or just metric?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Prometheus metrics are the only one being tested. However these metrics are pushed to CloudWatch using structured log, which explains why we test metrics and logs both.

}

@Override
public void validate() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between appmesh validation and the normal cloudwatch metric validation? seems we just do a simple check on it? is it intended or we had a "todo" for it?

Copy link
Member Author

@bjrara bjrara Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess we need a discussion on how to merge the two validators. There're some presumption that breaks container insight validations in the original CWMetricValidator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, what's the path forward now? we're gonna use this simple validation instead of making the validator more generic? i'd prefer we do the latter, but if we have to do so, let's add a "todo" here

Copy link
Member

@pingleig pingleig Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer we add todo and deal with it later, this PR is already pretty big and we are not enabling it until the related features merge into adot repo. We have plenty time to refactor those validators, both me and @pxaws will use cw metrics and log validators for other container insight related features and we can figure out the generic way along the way.

@bjrara bjrara force-pushed the terraform branch 9 times, most recently from c5301cd to 6385cf4 Compare March 26, 2021 22:48
@wyTrivail
Copy link
Contributor

please let me know when you are ready for review the code :)

@bjrara bjrara force-pushed the terraform branch 2 times, most recently from 366f548 to 5315575 Compare March 27, 2021 02:18
@bjrara bjrara changed the title Add AppMesh integration test [WIP] Add AppMesh integration test Mar 28, 2021
@bjrara bjrara changed the title [WIP] Add AppMesh integration test Add AppMesh integration test Mar 28, 2021
@bjrara
Copy link
Member Author

bjrara commented Mar 28, 2021

please let me know when you are ready for review the code :)

Thanks for keeping track of the PR. Please review again.

@bjrara bjrara changed the title Add AppMesh integration test Add AppMesh metric integration test Mar 30, 2021
terraform/eks-cloudwatch/appmesh/appmesh.tf Outdated Show resolved Hide resolved
image_pull_policy = "Always"
args = [
"--config",
"/etc/eks/prometheus/config-all.yaml"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we use config-all.yaml? Or make it as variable and each test case (eg, appmesh, nginx. etc) only apply their own configurations?

terraform/templates/defaults/validator_docker_compose.tpl Outdated Show resolved Hide resolved
@bjrara bjrara force-pushed the terraform branch 2 times, most recently from 68bdee0 to 19ba1a9 Compare March 30, 2021 20:27
@pingleig pingleig added this to the v0.9.0 milestone Mar 30, 2021
Copy link
Contributor

@wyTrivail wyTrivail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my major concern is still, we are creating a new folder for eks, i'm still thinking about how we can minimize the duplicate parts, for example:

  1. we are using testing_id to create namespace,
  2. we are deploying aoc with configmap.
  3. we will need to support mocked server which is already supported in eks folder today.

the eks folder today can serve all the testcases, cortex is just an optional parameter. this file lists all the testcases which could be running via eks folder.

i'd like to ask myself if we want to have a separate folder for some testcases,

  1. if the difference is in ot config, we might be able to just configure it in testcase?
  2. if the difference is on how to deploy otcollector, why can't we add a new "starting mode" into eks folder?
  3. if the difference is on how to deploy sample apps, why can't we make the deployment of sample apps configurable? or add a file under eks folder?
  4. if the difference is on validation, we might be able to just configure the parameters in testcase?
  5. how hard would it be to make eks folder more generic?

If we can have a plan around how do we maintain the separate folders, either we have solid reasons to keep them separate, or we have a vision to merge them someday, i will be okay with that.

# permissions and limitations under the License.
# -------------------------------------------------------------------------

variable "provider_url" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better put variables in variables.tf

Copy link
Member Author

@bjrara bjrara Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure we're on the same page, appmesh works as an independent stateless module, which only accepts input vars, and export variables that would be used by its caller. It self doesn't rely on any other module, that's why we don't put the variables in a common place.

}

variable "sample_app_image_repo" {
type = string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder do we need to specify type here as string is the default type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i remember this is a var in common.tf, why's the reason to create one here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@bjrara bjrara Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the variables in common.tf is poorly documented. I'm not sure how it is used and how it should be used. As you may see appmesh requires multiple app image, and apparently a single variable naming sample_app_image can't fulfill my needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to add a comment above sample_app_image_repo to explain it's for multiple app images, which i was not aware of when i saw this code.

default = ""
}

variable "region" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like many vars are duplicate with the ones in common.tf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default = "kubeconfig"
}

output "metric_dimension_namespace" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use it for debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@bjrara bjrara Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

variable "sample_app_image_repo" {
default = "611364707713.dkr.ecr.us-west-2.amazonaws.com/otel-test/container-insight-samples"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's the source code of this sample app? i wonder if we could put it under the sample-apps folder so that we can manage it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code can be found following our official setup guidance, it is maintained in another AWS repo: https://github.com/aws/aws-app-mesh-examples/tree/master/walkthroughs/howto-k8s-http-headers.
Ref:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Sample-Workloads-appmesh-EKS.html

I didn't put it under the sample-apps folder because I personally think:

  1. It increases the effort to sync this copy with the original one.
  2. I can't see any potential user that is interested in running the integration tests locally. If this image would only be used by aoc integration test, I don't see a strong reason to maintain it. What's the benefit?
  3. We don't aim to test sample apps, they serve for testing aoc. A stable version of sample app is enough to meet the requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It increases the effort to sync this copy with the original one. make sense.
  2. I can't see any potential user that is interested in running the integration tests locally. If this image would only be used by aoc integration test, I don't see a strong reason to maintain it. What's the benefit? -> who is maintaining this sample app now? does it have workflow for auto-build? how do we track the version?
  3. We don't aim to test sample apps, they serve for testing aoc. A stable version of sample app is enough to meet the requirement. -> would we need performance test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this sample app is just for aoc integration-test, why don't we move it into this repo? i'm okay with that if this sample app has multiple usages than aoc inregration-test

Copy link
Member Author

@bjrara bjrara Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is maintaining this sample app now? does it have workflow for auto-build?

I believe it's EKS team, checking the contributors.
Per README, they've provided a deploy script that will build the image, and deploy the relating resources to the cluster (cleansing should still be taken cared of by users).

how do we track the version?

Images are for sample apps, we don't need version tracking, or so I thought.

would we need performance test?

I'd like to see it happen. We don't have concrete plan. Yet that's unrelated to sample app images IIUC.

this sample app is just for aoc integration-test, why don't we move it into this repo?

Apparently it's not for aoc integration-test, but is recommended by CloudWatch for AppMesh quick start.

Copy link
Member

@mxiamxia mxiamxia Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls help to leave source code link in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added def and source code links to the variable.

@@ -26,6 +26,7 @@ services:
- "ecsTaskDefFamily=${ecs_taskdef_family}"
- "--ecs-context"
- "ecsTaskDefVersion=${ecs_taskdef_version}"
- "--cloudwatch-context=${metric_dimension_json}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these dimension values from eks?

Copy link
Member Author

@bjrara bjrara Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping containerinsight-ecs-prometheus would reuse this configuration too.

@@ -0,0 +1,2 @@
# this file is defined in validator/src/main/resources/validations
validation_config="eks-container-insight.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this folder is just for eks container insight but not for ecs?

Copy link
Member Author

@bjrara bjrara Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the folder

Copy link
Contributor

@wyTrivail wyTrivail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it. i still have many questions which i hope we address them with opening an issue though.

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

my major concern is still, we are creating a new folder for eks, i'm still thinking about how we can minimize the duplicate parts, for example:

  1. we are using testing_id to create namespace,
  2. we are deploying aoc with configmap.
  3. we will need to support mocked server which is already supported in eks folder today.

the eks folder today can serve all the testcases, cortex is just an optional parameter. this file lists all the testcases which could be running via eks folder.

i'd like to ask myself if we want to have a separate folder for some testcases,

  1. if the difference is in ot config, we might be able to just configure it in testcase?
  2. if the difference is on how to deploy otcollector, why can't we add a new "starting mode" into eks folder?
  3. if the difference is on how to deploy sample apps, why can't we make the deployment of sample apps configurable? or add a file under eks folder?
  4. if the difference is on validation, we might be able to just configure the parameters in testcase?
  5. how hard would it be to make eks folder more generic?

If we can have a plan around how do we maintain the separate folders, either we have solid reasons to keep them separate, or we have a vision to merge them someday, i will be okay with that.

Add some thoughts for people who are interested in this comment. Please note this is still an open question needs to be addressed later in the future.

Correction:
The otel configuration used in the integration test is NOT a testing config but rather a config that would be released to users for container insight on EKS. It will be built in the image, accessible directly. We're NOT deploying otel collector with configmaps. No extra setup process is required when deloying aoc in the testing cluster.

Can containerinsight-eks as well as containerinsight-ecs be merged into existing modules?
Before this question can be answered, I personally would like to be provided a best-practice guidance for adding new test cases on difference platforms. For instance, is eks used to validate the main functions from the platform perspective or is it served for more general purpose, to hold all the validations happen on EKS? Is the current design/implementation capable of doing that? What's the minimum test set of eks module? How are different submodules isolated during runtime? Is it possible to run a subset that I'm interested in? What are the semantics of the common variables? Docs?

Without these instructions and agreed contracts, it's hard for contributors to follow what is unspoken.

@wyTrivail
Copy link
Contributor

my major concern is still, we are creating a new folder for eks, i'm still thinking about how we can minimize the duplicate parts, for example:

  1. we are using testing_id to create namespace,
  2. we are deploying aoc with configmap.
  3. we will need to support mocked server which is already supported in eks folder today.

the eks folder today can serve all the testcases, cortex is just an optional parameter. this file lists all the testcases which could be running via eks folder.
i'd like to ask myself if we want to have a separate folder for some testcases,

  1. if the difference is in ot config, we might be able to just configure it in testcase?
  2. if the difference is on how to deploy otcollector, why can't we add a new "starting mode" into eks folder?
  3. if the difference is on how to deploy sample apps, why can't we make the deployment of sample apps configurable? or add a file under eks folder?
  4. if the difference is on validation, we might be able to just configure the parameters in testcase?
  5. how hard would it be to make eks folder more generic?

If we can have a plan around how do we maintain the separate folders, either we have solid reasons to keep them separate, or we have a vision to merge them someday, i will be okay with that.

Add some thoughts for people who are interested in this comment. Please note this is still an open question needs to be addressed later in the future.

Correction:
The otel configuration used in the integration test is NOT a testing config but rather a config that would be released to users for container insight on EKS. It will be built in the image, accessible directly. We're NOT deploying otel collector with configmaps. No extra setup process is required when deloying aoc in the testing cluster.

Can containerinsight-eks as well as containerinsight-ecs be merged into existing modules?
Before this question can be answered, I personally would like to be provided a best-practice guidance for adding new test cases on difference platforms. For instance, is eks used to validate the main functions from the platform perspective or is it served for more general purpose, to hold all the validations happen on EKS? Is the current design/implementation capable of doing that? What's the minimum test set of eks module? How are different submodules isolated during runtime? Is it possible to run a subset that I'm interested in? What are the semantics of the common variables? Docs?

Without these instructions and agreed contracts, it's hard for contributors to follow what is unspoken.

  1. when we designed this framework, we were trying to decouple otconfig, sample app, validation, and computing platforms as a test case is constructed by the four, so that when we define a testcase in testcases folder, we define what's the otconfig, what's the sample app, and what's the validation config, also define which platform this testcase will be running. it's not about should we hold all the validations happen on EKS as validation and computing platform are decoupled.

  2. Is the current design/implementation capable of doing that? At least we covered all the testcases so far. As maintainers, we should know it's not mature and understand that there will be new types of testcases coming in anyway and we need to keep improving it gradually.

  3. What's the minimum test set of eks module? again, we were trying to decouple that, EKS module is a module being able to deploy collector and sample apps, if there's a testcase we can't cover, we need to think about which part is missing.

  4. How are different submodules isolated during runtime? what kind of isolation? there will be no isolation issue as each testcase is running separately in different container, and i don't think they need to be isolated.

  5. What are the semantics of the common variables? Docs? could you point out which parameter is confusing so that we can improve it?

what's the biggest intention bring us to create a new folder? can we define more test cases in this folder?

Copy link
Member

@mxiamxia mxiamxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thx

@mxiamxia mxiamxia merged commit e146b07 into aws-observability:terraform Mar 31, 2021
@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

Just to make myself clear, I'm not opposed to merge container insight test cases to eks. I'm just proposing more requirement to the framework, specifically the documentation.
As you said,

if there's a contributor who wants to add a new testcase, how do we guide him to add.

They clearly wouldn't be guided by my PR, but documents of HOW TOs.

4. How are different submodules isolated during runtime? what kind of isolation? there will be no isolation issue as each testcase is running separately in different container, and i don't think they need to be isolated.
5. What are the semantics of the common variables? Docs? could you point out which parameter is confusing so that we can improve it?

what's the biggest intention bring us to create a new folder? can we define more test cases in this folder?

Add more thoughts on the testing,

  1. Isolation means the ability to run one/multiple test scenario(s) at one time. Users should be able to trigger a subset of the provided test cases with the minimum resources to be requested. Assuming containerinsight is finally merged with eks, can it be run without creating "cortex" and and "sample_app_deployment"?
    Furthermore, whether testcases can be run locally in separate containers depends on where the docker-compose file is created. I'm not sure about the github flow, but in my local environment, if they are in the same folder, running in different containers would result in sharing the same network, and will fail if running concurrently. That's another thing we need to take care of in the following actions.

  2. I think a clear definition on common variables is always a good way to avoid misunderstanding and confusion. I just don't feel safe to use any variable that I don't understand completely: https://github.com/aws-observability/aws-otel-test-framework/blob/main/terraform/common.tf.

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

Ship it. i still have many questions which i hope we address them with opening an issue though.

Created an issue: #258. I thought merging the folders is the only one left unresolved, isn't it?

@wyTrivail
Copy link
Contributor

Ship it. i still have many questions which i hope we address them with opening an issue though.

Created an issue: #258. I thought merging the folders is the only one left unresolved, isn't it?

and merging the validation part i guess?

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

Ship it. i still have many questions which i hope we address them with opening an issue though.

Created an issue: #258. I thought merging the folders is the only one left unresolved, isn't it?

and merging the validation part i guess?

Yup. Added. Thanks!

@wyTrivail
Copy link
Contributor

  1. Isolation means the ability to run one/multiple test scenario(s) at one time. Users should be able to trigger a subset of the provided test cases with the minimum resources to be requested. Assuming containerinsight is finally merged with eks, can it be run without creating "cortex" and and "sample_app_deployment"?
    Furthermore, whether testcases can be run locally in separate containers depends on where the docker-compose file is created. I'm not sure about the github flow, but in my local environment, if they are in the same folder, running in different containers would result in sharing the same network, and will fail if running concurrently. That's another thing we need to take care of in the following actions.

i'm not sure if it's easy to achieve running multiple test together, we made assumption that each test is running separately and rely on github workflow for parallelization.
But we should okay to use a subset of functions with flags, like right now, "cortex" is an optional one, we can define different deploy mode for sample apps and allow the sample apps deployments being to able to be plugged into the eks module. so that contributor will only need to develop the sample app deployment part but not the whole eks things.

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

  1. Isolation means the ability to run one/multiple test scenario(s) at one time. Users should be able to trigger a subset of the provided test cases with the minimum resources to be requested. Assuming containerinsight is finally merged with eks, can it be run without creating "cortex" and and "sample_app_deployment"?
    Furthermore, whether testcases can be run locally in separate containers depends on where the docker-compose file is created. I'm not sure about the github flow, but in my local environment, if they are in the same folder, running in different containers would result in sharing the same network, and will fail if running concurrently. That's another thing we need to take care of in the following actions.

i'm not sure if it's easy to achieve running multiple test together, we made assumption that each test is running separately and rely on github workflow for parallelization.
But we should okay to use a subset of functions with flags, like right now, "cortex" is an optional one, we can define different deploy mode for sample apps and allow the sample apps deployments being to able to be plugged into the eks module. so that contributor will only need to develop the sample app deployment part but not the whole eks things.

A separate project needs to be assigned when executing docker-compose to run tests simultaneously. It can be fixed during layout refactoring as long as the agreement is made.

@wyTrivail
Copy link
Contributor

wyTrivail commented Mar 31, 2021

  1. Isolation means the ability to run one/multiple test scenario(s) at one time. Users should be able to trigger a subset of the provided test cases with the minimum resources to be requested. Assuming containerinsight is finally merged with eks, can it be run without creating "cortex" and and "sample_app_deployment"?
    Furthermore, whether testcases can be run locally in separate containers depends on where the docker-compose file is created. I'm not sure about the github flow, but in my local environment, if they are in the same folder, running in different containers would result in sharing the same network, and will fail if running concurrently. That's another thing we need to take care of in the following actions.

i'm not sure if it's easy to achieve running multiple test together, we made assumption that each test is running separately and rely on github workflow for parallelization.
But we should okay to use a subset of functions with flags, like right now, "cortex" is an optional one, we can define different deploy mode for sample apps and allow the sample apps deployments being to able to be plugged into the eks module. so that contributor will only need to develop the sample app deployment part but not the whole eks things.

A separate project needs to be assigned when executing docker-compose to run tests simultaneously. It can be fixed during layout refactoring as long as the agreement is made.

probably check https://github.com/nektos/act, which enable us run workflow locally. It was not too mature last year where credentials can not be used, not sure if it's okay to use now.

And we will need to think about do we have such requirement of running test simultaneously in local, normally we develop one testcase one time, and should be able to use workflow to run regression test. From resource consumption wise, although tests are running together, the resources are still the same, the cpu/memory consumption should be the same, each sample apps/collector will still take its resource. Furthermore, since these tests are short-time test, i'm not sure if we really care about the consumption. We might be concerning on soaking and performance test, but it's better to isolate these tests on instance level so that we can get a better tuning result. So far I don't see too much benefit to run test simultaneously, happy to know if we have any idea on it.

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

  1. Isolation means the ability to run one/multiple test scenario(s) at one time. Users should be able to trigger a subset of the provided test cases with the minimum resources to be requested. Assuming containerinsight is finally merged with eks, can it be run without creating "cortex" and and "sample_app_deployment"?
    Furthermore, whether testcases can be run locally in separate containers depends on where the docker-compose file is created. I'm not sure about the github flow, but in my local environment, if they are in the same folder, running in different containers would result in sharing the same network, and will fail if running concurrently. That's another thing we need to take care of in the following actions.

i'm not sure if it's easy to achieve running multiple test together, we made assumption that each test is running separately and rely on github workflow for parallelization.
But we should okay to use a subset of functions with flags, like right now, "cortex" is an optional one, we can define different deploy mode for sample apps and allow the sample apps deployments being to able to be plugged into the eks module. so that contributor will only need to develop the sample app deployment part but not the whole eks things.

A separate project needs to be assigned when executing docker-compose to run tests simultaneously. It can be fixed during layout refactoring as long as the agreement is made.

probably check https://github.com/nektos/act, which enable us run workflow locally. It was not too mature last year where credentials can not be used, not sure if it's okay to use now.

And we will need to think about do we have such requirement of running test simultaneously in local, normally we develop one testcase one time, and should be able to use workflow to run regression test. From resource consumption wise, although tests are running together, the resources are still the same, the cpu/memory consumption should be the same, each sample apps/collector will still take its resource. Furthermore, since these tests are short-time test, i'm not sure if we really care about the consumption. We might be concerning on soaking and performance test, but it's better to isolate these tests on instance level so that we can get a better tuning result. So far I don't see too much benefit to run test simultaneously, happy to know if we have any idea on it.

Some statistics:
A history discussion can be found: #247 (comment)
For each workload in containerinsight, setup (install resources in EKS) 1 minute, docker compile 2 minutes, wait logs and metrics ready 3 minutes.
If workload is run one by one in different test cases (although I took a different approach), it would take 30 minutes to run the entire tests. Running concurrently takes 6 minutes.

We had 6 nodes in the testing cluster, for all the workload in containerinsight, it requires a successful creation of 19 Pods (sample, traffic generator, component control plane). How many integration flow can be executed at one time without possible impact of resource shortage.

If contributors want to run on their on EKS cluster, how many nodes is recommended to be created for our framework? They pay for it.

@bjrara
Copy link
Member Author

bjrara commented Mar 31, 2021

@wyTrivail Thanks for the careful review and patient discussion with me about the understanding of design and what the framework should/would be! It was a great honour to share opinions with you. But I think there's too much information in this PR and I like to end the discussion for now. I believe most of the topics can be better understood by future maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants