Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unblock the testing for go-ipfs 0.5.0 release #196

Closed
2 of 12 tasks
daviddias opened this issue Dec 2, 2019 · 7 comments
Closed
2 of 12 tasks

Unblock the testing for go-ipfs 0.5.0 release #196

daviddias opened this issue Dec 2, 2019 · 7 comments
Assignees

Comments

@daviddias
Copy link
Contributor

daviddias commented Dec 2, 2019

The primary goal for the TestGround MVP is to unblock the testing for go-ipfs 0.5.0 release. To achieve that goal, it was instructed by @Stebalien as the IPFS Tech Lead that the requirement is to test the "Stabilize the DHT" branch(es) of go-libp2p-kad-dht. This requirement was identified during the Co-Location Hack Week of early November.

The current state of TestGround and where we are headed can be consulted at State of TestGround, mid November Q4 2020 report.

In tl;dr; what we need to deliver on the goal is:

  • Expand the TestGround Infrastructure so that it can run multiple thousands of nodes (owner: @nonsense)
  • Add support to Traffic Shaping & Network Configuration to TestGround (owner: @Stebalien)
  • Instrument go-libp2p so that the DHT operations are observable and traced in a Distributed Context (owner: @raulk)
  • Creation of a TestGround Dashboard that can aid the developers of go-ipfs & go-libp2p to grok the results outputted by every node in a TestGround run (owner: @daviddias)
  • Keep on iterating the DHT Test Plan (owner: @aschmahmann )

Other current active endeavors, not on the critical path are:

@momack2
Copy link
Contributor

momack2 commented Dec 2, 2019

I think there is more than this, no? @Stebalien mentioned needing to run multiple different go-ipfs versions within a test plan to ensure the "partial upgrade" state works as well.

In addition to iterating on the DHT test plan, we need to actually be able to run it at scale -- and likely run it against a set of different branches with the core pieces in the stabilize branch + different feature additions slated for 0.5.0 to see how performance and functionality change. I imagine just doing that round-trip will take a number of days and would ideally be divided between different folks to share the load.

@daviddias
Copy link
Contributor Author

I think there is more than this, no?

That's correct, the top level 5 items cover a ton of work that means to be done. I'll keep the Epic (using ZenHub) fresh as we unpack and pave the path for the features and bugs that need to be taken care of.

@Stebalien mentioned needing to run multiple different go-ipfs versions within a test plan to ensure the "partial upgrade" state works as well.

TestGround has offered the functionality to use a specific version using the --dep parameter since mid September 2019. Fun fact, it was the only way I could make progress on the DHT Test Plan (https://github.com/ipfs/testground/tree/master/plans/dht) during the Hack Week in early November, since go-libp2p-kad-dht was in this weird state where the master branch was not the release and I had to pull the version from master.

What I believe @Stebalien is looking for is the agreement in which we describe in a declarative way which versions should be tested, so that TestGround can do that automatically rather than having a user run the plan multiple times with different --dep=.

We've discussed it during the previous Weekly and described 3 options on how to do it. The decided next step was for @Stebalien to open an issue describing these (https://github.com/ipfs/testground/pull/188/files#r353018068) so that when @raulk was back, he could greenlight the way to go. You can watch the recording for the details of that highlight.

Once that issue is created and that work is unpacked, we can then have a issue that outlines it and include it in the Epic.

In addition to iterating on the DHT test plan, we need to actually be able to run it at scale... I imagine just doing that round-trip will take a number of days and would ideally be divided between different folks to share the load.

You are correct. Debugging a problem or verifying that the lack of problems is not a false positive will take multiple days to weeks of training the core developers to use the tool (just like a Networks engineer gets training in Wireshark), refining the tests, tweaking values, adjusting the ways to which we observe the output of the tests and so on.

In this issue, we are explicitly tracking only what the TestGround team needs to land for the go-ipfs team to be capable of landing a go-ipfs release. The actual work that the go-ipfs team will need to perform (akin to a Release Checklist) should (IMHO) tracked in a go-ipfs issue, potentially pinned at the top so that the users are informed in the right location.

@Stebalien Stebalien mentioned this issue Dec 3, 2019
21 tasks
@daviddias daviddias changed the title Unblock the go-ipfs 0.5.0 release Unblock the testing for go-ipfs 0.5.0 release Dec 5, 2019
@daviddias
Copy link
Contributor Author

@Stebalien
Copy link
Contributor

Fix bugs in the DHT

As we work on the testground test cases, we're finding and fixing bugs in the DHT (even when used at small scales). Tracking them here:

@daviddias
Copy link
Contributor Author

We bring news 📰

The Testground team had the chance to spend 10 days working together and onboarding our new contributors, @hacdias & @nonsense! It was 10 days full of delicious delicacies 🧀, lot’s of ☕️ and tons of hacking 💻! We’ve a ton of learnings to share with you, but for now, we want to give you a quick update on Testground and how it related to go-ipfs goal of Releasing 0.5 ipfs/kubo#6776.

What you can do today with Testground

  • ✅ Create a Distributed Test Plan with multiple parameterized test cases
  • ✅ Run the test against multiple versions of the module you are testing (using the --dep flag)
  • ✅ Spin a test that takes ~400 instances (e.g. ~400 libp2p nodes) in a single machine[1] (using the --runner=local:go)
  • ✅ Spin a test that takes ~90 instances (e.g. ~90 libp2p nodes) in a single machine using docker[1] (using the --runner=docker:go)
  • ✅ Deploy your own Docker Swarm Cluster in AWS in self-service mode (instructions provided at /testground/infra/docker-swarm/README.md)
  • ✅ Instrument your test and collect metrics. These can then be found in an ElasticSearch service.
  • ✅ Create Network Configurations using sidecar, setting each node connection latency, bandwidth and jitter.

[1] Your mileage may vary depending on your developer environment.

What you will get the chance to do in the next iteration (aka what you can’t do today)

  • ⏳ Spin a test that takes 500 instances (e.g. 500 libp2p nodes) in a Docker Swarm Cluster deployed in AWS (using the --runner=cluster:swarm ; using infra based on ./testground/infra/docker-swarm playbooks)
  • ⏳ Spin multiple thousands of nodes
  • ⏳ Set Network topologies
  • ⏳ Isolate the nodes behind NATs
  • ⏳ A dashboard that helps you understand better what the logs of your test indicated

Also related, our friends from the libp2p team are working on:
Adding instrumentation to libp2p so that we can benchmark, monitor and ideally trace the multiple interactions between nodes.

The Test Plans that we have ready for you to try

Today, you can try:

With more coming up soon (i.e. they are not blocked):

How to get started

If this is your first time looking at Testground, do not worry, it is simple. Just follow the steps on https://github.com/ipfs/testground/blob/master/docs/USAGE.md and you will be set!

Please let us know of your experience experimenting with Testground. Report bugs as issues.

Thank you so much for your attention. Happy testing!
The Testground team 🎄

@raulk
Copy link
Contributor

raulk commented Mar 3, 2020

Update.

The IPFS Content Routing improvements efforts are well underway. Have a look at the flurry of activity in the go-libp2p-kad-dht repo.

At Testground, we are working very closely with the IPFS team to serve their needs for iterative, experimental debugging workflows. This phase of the development of Testground is called Testground Maturity Stage 1.

During IPFS Team Week, we defined our mission to be the following:

image

This is our roadmap:

image

And this is what's in scope for Maturity Stage 1:

image

image

We are working very hard to deliver on these features and expectations.

So far, it's going well.

@raulk raulk unpinned this issue Mar 31, 2020
@raulk raulk added this to the Testground v0.4 milestone Apr 3, 2020
@raulk
Copy link
Contributor

raulk commented Apr 3, 2020

For a few weeks now, the go-ipfs team has been using Testground for iterative/experimental workflows with reasonable level of success to validate the Content Routing and Bitswap changes.

Of course, things show up every day, and those are being filed as individual issues, and being actioned on promptly by the Testground team. But it's fair to say that the Content Routing testing efforts have been unblocked for a while.

So far, the Content Routing WG has been able to execute 2k-instance tests with the cluster:k8s runner.

We continue to harden Testground, particularly the k8s infrastructure and the sync service, to reach our ultimate goal of running 10k-instance tests reliably. Those efforts are being tracked in #599.

In a nutshell, we have made a lot of progress, up to a point where we are not blocking the ongoing tests of go-ipfs v0.5.0, but the more finer-grained state of affairs is captured on the ZenHub board.

Accordingly, I'm closing this issue and urge folks to visit the ZenHub board for a more updated view of reality.

@raulk raulk closed this as completed Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants