Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap v2 #410

Closed
wants to merge 26 commits into from
Closed

Roadmap v2 #410

wants to merge 26 commits into from

Conversation

pepoviola
Copy link
Collaborator

Brainstorming on tasks for v2 roadmap.

Roadmap Zombienet v2

## Infra
- Chaos testing, add examples and explore possibilities in `native` and `podman` provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is top priority for parachains. We want to roll out a separate CI pipeline to do these long duration tests.

roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
@@ -0,0 +1,35 @@
Roadmap Zombienet v2

## Infra
Copy link
Contributor

@sandreim sandreim Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus server deployment -> asserting against prometheus queries.

  • we remove code and inefficiency from scraping metrics ourselves
  • scales better
  • we get a standardized way of querying metrics and ability to do aggregations as we see fit
  • (nice to have) create some test reports using these, maybe alarms for long duration failure conditions
  • local debugability increases

Open question:

  • do we use prometheus for native/local runs, or just k8s ? (if this differers across providers, prometheus queries won't work)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think local prometheus should be opt-in and not a default.

roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
wirednkod and others added 2 commits September 20, 2022 16:23
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
- Create decorators registry and allow override by paras (wip)
- Explore how to get info from paras.

## Functional tasks
Copy link
Contributor

@sandreim sandreim Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zombienet Test SDK (Rust) - it's not replacing the deployment which remains as is (typescript) + additional features listed above. Integration point is the output JSON file from deployment which is used to spawn the Zombienet Test SDK test environment.

Advantages of this approach:

  • one language to fully write the entire test (deployment part + actual tests to be performed)
  • simple APIs to generate the Network file - builder pattern
  • simple APIs to query/assert metrics
  • write more complex test logic
  • directly use or wrap subxt - open HRMP channels, send XCM, do runtime upgrades, etc
  • opens up the Rust ecosystem of libraries that can be used in a test
  • feature parity with current DSL support (phasing out DSL over time after we have same featureset)
  • easier for people outside Parity to contribute to the project

@@ -0,0 +1,35 @@
Roadmap Zombienet v2

## Infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long lived test networks deployment and management.

  • deploy and manage Versi using Zombienet - similar functionality with validator manager
  • obsoletes our custom solution currently built and maintained by devops

@@ -0,0 +1,35 @@
Roadmap Zombienet v2

## Infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment scalability improvements - up to 1000 validators.

pepoviola and others added 4 commits September 20, 2022 14:24
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
## Infra
- Chaos testing, add examples and explore possibilities in `native` and `podman` provider
- Add `docker` provider
- Add `nomad` provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @drahnr, did you have time to make a quick chat to review some request relates to nomad?
Thanks!


## Functional tasks
- Add subxt integration, allow to compile/run on the fly
- Move parser to pest (wip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ happy to collab on this :)

Copy link
Collaborator Author

@pepoviola pepoviola Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @drahnr, I started the work in this draft pr. The parser is already working and I need to finish the wiring between the parser crate and the test-runner. If you have time to review the pr would be awesome :)

Thanks!

roadmap.md Outdated Show resolved Hide resolved
pepoviola and others added 3 commits September 21, 2022 06:47
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
roadmap.md Outdated Show resolved Hide resolved
pepoviola and others added 2 commits September 30, 2022 07:36
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
roadmap.md Outdated Show resolved Hide resolved
@bkchr
Copy link
Member

bkchr commented Oct 7, 2022

Nice to see the roadmap for v2.

I'm really happy that the DSL is on its way out, I was never a fan of it. I read @sandreim comment about using Rust and while I understand his intention, I would say we should use typescript. Using typescript doesn't mean we will never have a Rust integration. However, the most important point here is time and resources (humans). IMO we should be able to move forward as fast as possible into different directions of usage with zombienet. For this, I think the current DSL is the main blocker. Using a raw typescript interface will give us access to the entire universe of polkadot js and everything that is build around it. We would get directly access to all the tools to send transaction and interact the node in all the required ways. For any other kind of interaction, like checking the status of metrics there probably already exists a package or we could write some thin layer around zombienet. My ideal way of interacting with zombienet would be as similar as in this old Rust integration test in Cumulus: https://github.com/paritytech/cumulus/blob/master/client/pov-recovery/tests/pov_recovery.rs

So, there would exist ONE file per test that is defines the test. No extra file to define the topology or whatever, I already have seen on how you hacked iterations into the topology file. Then reading that you want to create an UI for creating these files, IMO that is too much for now and not what we need. Our main audience are developers that writing blockchains, these people feel comfortable in text files and not in UIs ;) For spawning nodes there could be a thin interface being provided by zombienet to do common things.

See the following as some sort of pseudo code (because I didn't yet learn TS, but I would do it for zombienet :P):

let alice = start_node(node = /bin/polkadot, args = "--alice");
let bob = start_node(node = /bin/polkadot, args = "--bob", bootnode = alice);

wait_for_start([alice, bob]);

register_parachain(wasm_file, state).await;

let collator = start_node(node = /bin/collator, args = "--collator");

collator.wait_for_blocks(7).await;

Something like this. I get that this is very rough and we would also need a way that the specification of node could work with the different kind of providers. So, either you have it running native, using docker or whatever. I think that this could be achieved in some way. And for sure there would be thousands of other ways you want to configure nodes. However, the goal should be to start simple and to be able to pass cli flags etc manually. Then when we see some pattern like "connect to this node as bootnode" or "only connect to this node". Whatever, but when we see these patterns we can help people by introducing functions for them. But we should not try to restrict how people interact/start the nodes as this would always zombienet the centralized source that would need to be extended first to support new functionality. However, it should be the other way around. People can interact with new node apis really easily by write their own RPC calls or whatever, but then can come to zombienet and contribute a common function that does this for all in a simple way.

I hope it is clear on what I have written 😅 And thank you for all the hard work ❤️

@pepoviola
Copy link
Collaborator Author

Hi @bkchr, thanks for your feedback!! It's awesome :) I really like the approach of having only one file to build the network and run test/interactions. Typescript support is something we definitely we want to maintain, and I think we can produce a nice and flexible way to build and interact with the topology both from rust and ts.

We plan to have a session in the retreat about the new SDK to present our high-level design and collect feedback from others teams in order to build the flexibility needed by the different use cases.

Again, thanks you very much for your feedback 🙌 🙌

@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/chopsticks-substrate-testing-client/878/10

- Add more CLI subcommands
- Add js/subxt snippets ready to use in assertions (e.g transfers)
- Add XCM support in built-in assertions
- Add support to start from a live network (fork-off) [check subalfred]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our experience (already doing fork-off Moonbeam chains):
This will require chain-specific actions (changing the authoring values, changing council, changing balance....). So a proper way to interact with the raw state is needed for each case.
It also requires to deal with very large database (or exported state json file) (Moonbeam > 5GB), which is not often easily done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps file sub-issues to all roadmap items for further discussion once finalized.

Certainly, this is a desirable feature. The main motivation I see is to test node or runtime changes against live network states as a pre-release check. This does require some ability to interact with raw state. To start from a live network state, the two major things are to allow swapping out the runtime (optional) and to allow modification of the state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps file sub-issues to all roadmap items for further discussion once finalized.

Certainly, this is a desirable feature. The main motivation I see is to test node or runtime changes against live network states as a pre-release check. This does require some ability to interact with raw state. To start from a live network state, the two major things are to allow swapping out the runtime (optional) and to allow modification of the state.

Thanks for your feedback @rphmeier! I will create a couple of umbrella issues to capture the vision/plan for the public roadmap and individual issues for items.

Thanks!

roadmap.md Show resolved Hide resolved
@wirednkod wirednkod self-requested a review October 27, 2022 16:08
roadmap.md Outdated Show resolved Hide resolved
roadmap.md Outdated Show resolved Hide resolved
pepoviola and others added 2 commits October 29, 2022 04:50
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
Co-authored-by: Nikos Kontakis <wirednkod@gmail.com>
@samelamin
Copy link
Contributor

samelamin commented Nov 3, 2022

Hi All, Really happy that you are already planning a roadmap to help onboard new devs, that's great!

I would love to give some feedback if possible. For a new dev joining the ecosystem, they already have a really steep learning curve, to name a few things like learning Rust and substrate, understanding the dependencies between substrate/Polkadot/Cumulus and adopting core blockchain-based primitives, that is a lot!

I just want to spin up a chain and send some transactions, why do I need/have to learn K8? 😕

So as someone new, I would like a really simple way to spin up the entire infrastructure. Ideally, run one line if possible. That is the sort of user experience we should aim for.

As great as Zombienet is right now, there are alot of prerequisites you have to install just to get a local chain running. Ironically it was much simpler to run using polkadot-launch despite the fact that you still had to have the binaries installed locally.

We already have the binaries precompiled as images on docker hub so I would argue that the simplest example we can provide is just to use docker-compose. Parachain-launch already does something similar

I recently submitted a pr to cumulus to get everything(relay +chain) in a docker-compose file but @bkchr kindly pointed me to zombienet

I see from the roadmap that you are already planning to add a docker provider but why not have a compose file as an in-between step until the provider is built

for example, what if the image is just zombienet which has all the required dependencies installed and then just runs an example file to spin up everything.

That way you reduce the number of steps needed to get a network up while at the same time give new devs a chance to understand what exactly is happening without going through the time-consuming process of learning how to set up zombienet?

Let me know if this feedback makes sense, I am also happy contributing in any way

@bkchr
Copy link
Member

bkchr commented Nov 3, 2022

Besides the stuff I proposed above and we talked about IRL, we may should have some kind of "adhoc" mode as @samelamin proposed it above. This adhoc mode could be using some kind of different format for the node declaration file. Or we use the same type script syntax for spawning the nodes, but don't stop the processes when the script ends and keep everything running until zombienet is killed. This could then be used for this kind of adhoc testing.

@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented Jan 13, 2023

About fork:

Can do

  1. Separate command to download snapshot
  2. Already works with relay and para chains
  3. Seems works with 840 MB runtime
  4. HTTP/WS downloads
  5. ZN as I send supports pointing it to own genesis and runtime?

So I can have just a separate step to download both. And then some docs how to configure ZN? Right.

Should some fixes to ZN done to handle it right genesis uses? Would relay genesis hash swapped to rococo local in provided file?

For fork, I would prefer separate command to download data and then run ZN on it.

Why? Because our CRON can download it and put into some easy HTTP GET place, from which we can grab it and run NIX based tests.

@wirednkod wirednkod changed the title [DRAFT] Roadmap v2 Roadmap v2 Feb 27, 2023
@pepoviola pepoviola marked this pull request as ready for review February 27, 2023 09:41
@pepoviola
Copy link
Collaborator Author

pepoviola commented Apr 20, 2023

I will close this one since the roadmap is now public and the items (or adding new ones) can be discussed there.
Thanks!

@pepoviola pepoviola closed this Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants