Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Methodology #43

Open
zscole opened this issue Aug 15, 2018 · 3 comments
Open

Testing Methodology #43

zscole opened this issue Aug 15, 2018 · 3 comments

Comments

@zscole
Copy link
Contributor

zscole commented Aug 15, 2018

Overview
The network is segmented into X number of shards. Every ~10 minutes, validators are randomly assigned to a shard, so the stress point is observing and testing the ability of validators to subscribe to new topics and send/receive messages pertaining to this new topic in an adequate amount of time.

Test Utility
We will perform tests using the Whiteblock testing platform. The following functionalities would likely be most relevant to this particular test series:

  • Number of nodes: <100 (more if necessary)
  • Automated provisioning of nodes
  • Behaviors, parameters, commands, and actions can be automated or assigned to individual nodes and also the network as a whole.
  • Bandwidth: 1G (standard, up to 10G if necessary) can be configured and assigned to each individual node.
  • VLAN: Each node can be configured within its own VLAN and assigned a unique IP address, allowing for the emulation of realistic network conditions which accurately mimic real-world conditions.
  • Latency: Up to 1 second of network latency can be applied to each node’s individual link.
  • Data aggregation and visualization

Test Scenarios

  • Observe and measure performance under the presence of various network conditions.
    • Latency between nodes:
      • What is the maximum amount of network latency each individual node can tolerate before performance begins to degrade?
      • What are the security implications of high degrees of latency?
      • Are there any other unforeseen issues which may arise from network conditions for which we can’t accommodate via traditional code?
    • Latency between shards.
    • Intermittent blackout conditions
    • High degrees of packet loss
    • Bandwidth constraints (various bandwidth sizes)
  • Introduce new nodes to network:
    • Add/ remove nodes at random.
    • Add/remove nodes at set intervals.
    • Introduce a high volume of nodes simultaneously.
  • Partition tolerance
    • Prevent segments of nodes from communicating with one another.
  • Measure the performance of sending/receiving messages within set time periods and repeat for N epoches.
  • Observe process of nodes joining and leaving shards
    • Subscribing to shards
    • Connecting to other nodes within shard
    • Synchronize collations and ShardPreferenceTable
    • Unsubscribing from shards

Need to Define

  • Configuration specifications for relevant test cases to create templates which allow for quicker test automation.
  • Code which should be tested.
  • Preliminary testing methodology should be established based on everyone's input.
    • We can make adjustments to this methodology based on the results of each test case.
    • It's generally best (in my experience) to create a high-level overview which provides a more granular definition of the first three test cases and then make adjustments to each subsequent test series based on the results of those three.

Other Notes

  • This document acts as a high-level overview to communicate potential test scenarios based on our initial assumptions of the existing codebase. It is meant to act as a starting point to guide the development of a preliminary test series.
  • Although tests will be run locally within our lab, access to the test network can be granted to appropriate third-parties for the sake of due diligence, validation, or other purposes as deemed necessary.
  • The net stats and performance data dashboard can be assigned a public IP to allow for public access.
  • Raw and formatted data will be shared within appropriate repos.
  • Please voice any suggestions, comments, or concerns in this thread and feel free to contact me on Gitter.
@mhchia
Copy link
Collaborator

mhchia commented Aug 16, 2018

Good work! I think it makes sense. Thank you a lot for this document

Some questions:

Test Utility

  • Do we need to modify our code in order to do the emulation?
  • What do you mean by "Automated provisioning of nodes"?

Test Scenarios

  • We might also need to consider the network topology? Though at the beginning we can just spin up multiple bootnodes, and let the other nodes bootstrap through the bootnodes.

Need to define

Configuration specifications for relevant test cases to create templates which allow for quicker test automation.

Sorry I don't get what you mean here. Can you elaborate more?

Code which should be tested.

My thought is to test with the docker image. So there is one ./sharding-p2p-poc process running inside one docker instance. We just use the CLI commands to control it.

Preliminary testing methodology should be established based on everyone's input.
We can make adjustments to this methodology based on the results of each test case.
It's generally best (in my experience) to create a high-level overview which provides a more granular definition of the first three test cases and then make adjustments to each subsequent test series based on the results of those three.

Agree. Currently, we use CLI to communicate with the running nodes. From the results we see what happens after every command. My thought is to change the result to be easier parsing, and therefore construct tests in shell scripts in the first step. Do you have any idea on any testing methodology in our case?

btw. I modified your comment to fix the content of the URL of WhiteBlock.

@zscole
Copy link
Contributor Author

zscole commented Aug 16, 2018

Do we need to modify our code in order to do the emulation?

No, our platform is meant to accomodate for code. The idea is that code shouldn't need to accomodate for the platform. As long as it functions, we should be able to just throw it on the platform and get it going.

What do you mean by "Automated provisioning of nodes"?

The ability to deploy a specified number of fully configured nodes based on the Dockerfile and apply some sort of scheduling logic if needed to this process. Our platform functions similarly to Kubernetes and other orchestration/infrastructure automation utilities. We developed a custom wrapper for Docker that was purpose built for blockchain specific containerization.

We might also need to consider the network topology? Though at the beginning we can just spin up multiple bootnodes, and let the other nodes bootstrap through the bootnodes.

Can you please clarify your definition of network topology in this context? Part of the node deployment module includes the ability to assign nodes to independent VLANs and assigning IP addresses. This provides the ability to configure and control links between nodes (either individual links or the entirety of the network), such as implementing varying degrees of latency, bandwidth, packet loss, etc.

Sorry I don't get what you mean here. Can you elaborate more?

We can establish an automated workflow kind of like you would using Chef. Once we flush out definitions for each test case within the series, we can use these definitions to create a configuration file which allows for scheduling and automation.

Do you have any idea on any testing methodology in our case?

Let's talk about this offline so we can brainstorm!

@mhchia
Copy link
Collaborator

mhchia commented Aug 17, 2018

Can you please clarify your definition of network topology in this context?

I mean the topology of our overlay network, instead of the on in TCP/IP layer.
We might want to test

  • Nodes are connected in some specific manners. This is what I referred to as the "topology"
  • Nodes join the network by connecting to bootnodes, and asking for new peers from them randomly.
    Do these two cases make sense?

We can establish an automated workflow kind of like you would using Chef. Once we flush out definitions for each test case within the series, we can use these definitions to create a configuration file which allows for scheduling and automation.

Get it. It makes sense.

Thank you for the detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants