Git Product home page Git Product logo

polkadot-stps's Introduction

Polkadot sTPS

This repository is meant to aggregate performance benchmarks from the Polkadot Ecosystem and act as source of truth for Standard Transaction Per Second (sTPS).

The measurements are intended to replicate different possible network topologies and provide reference estimates of throughput capacity of substrate-based multichain environments.

zombienet is used as the main tool for spawning networks and sending the transactions towards them.

Please refer to docs for more information.

polkadot-stps's People

Contributors

bernardoaraujor avatar bredamatt avatar dependabot[bot] avatar ggwpez avatar pepoviola avatar pierrebesson avatar sergejparity avatar shawntabrizi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

polkadot-stps's Issues

Add versi metadata

Versi metadata seems to be ahead of rococo metadata, so need to make sure there is a feature for that in utils/src/lib.rs

  • Add Versi feature flag
  • Add Versi feature flag to Github action

dispatch transactions to multiple nodes in parallel

So far we have been focusing on a single node setup. However, we need to dispatch transactions against multiple nodes in parallel.

Zombienet DSL doesn't support this feature yet. Simply declaring multiple lines for multiple nodes will produce a sequential beaviour, where each node receive all its transactions, processes them, and only then another node will receive all its transactions.

This new feature was requested at paritytech/zombienet#207

After it is implemented, we need to modify the sender module as well as utils.js so that different fractions of funded-accounts.json are used for transactions for each separate node.

For example, assuming there's 5 nodes, and 50k funded accounts. Then 10k accounts are used for alice's transactions, other 10k are used for bob's transactions, and so on. And the transactions are sent against all 5 nodes at the same time.

Each funded account is never used for two different nodes, as it would violate the sTPS pre-conditions.

Fix `@polkadot/api` installation

Looks like the CI silently fails with an error that the @polkadot/api is not installed, eg on master branch: https://github.com/paritytech/polkadot-stps/runs/6239932068

1) custom js( System Events Custom JS )
       collator01: js-script ./0008-custom.js return is greater than 1 within 200 seconds:
     Error: Cannot find module '@polkadot/api'

We can probably pass some npm flags to make it work. Adding a set -e to the zombienet.sh script will ensure that we also do not ignore failed commands.

TPS meter

Just measuring the speed of submitting extrinsics to the mempool does not tell us how many are being included per block.
A second script which analyzes past blocks and prints how many extrinsics got included per block would be good.

sending transactions from pods

@pepoviola has mentioned that zombienet allows us to send the transactions from the same pod where the target node is running.

we should use this issue to discuss and brainstorm a few things:

  • we'll need a script to be executed on the pod... how does this script look like, and how do we get zombienet to trigger it?
  • by doing this, we are essentially eliminating the variable of network latency between transaction sender and target node... is that a desirable thing? IMO yes, but I need to reflect more

How many transfer events fits in one parablock?

Bounds on POV size and relay block size (i.e. availability cores) are important factors to consider when measuring TPS, in particular for parachains. Indeed, the max data-rate of the relay chain is a function of how many availability cores exist, times the maximum POV size. Hence, given that a Transfer takes single_transfer_proof_size amount of bytes, then the total number of such Transfer events to fit on the relay chain at any given time can roughly be modelled as:

max_parachain_tps = ((max_POV_size/single_transfer_proof_size)*num_availability_cores)/parablock_inclusion_time

where parablock_time is the time it takes to include a parablock / candidate in the relay chain, which will vary based on whether sync-, or async-backing is used.

If we know how many Transfer events will fit in a particular POV, we can then start to think more clearly about how many sender binaries we should use to maximise throughput during the (s)TPS test.

This should definitely also relate to the weight associated with the Balances pallet transfers. I would like to see a way to go from weight to bytes, but don't think this is possible. So ideally, there is a way to calculate the size in bytes for each event given some Pallet. See: https://substrate.stackexchange.com/questions/518/what-does-pov-stand-for/519#519 for some more context.

Large `funded-accounts.json` causes invalid chain spec

When using a funded-accounts.json file with 100k accounts, the following error can be observed.
Not sure what is going on. I personally know that JS has a string length limit of 512MiB above the JS tooling becomes worthless, but we are currently far from that limit.

Error: Invalid chain spec raw file generated.
    at /snapshot/zombienet/dist/providers/k8s/chain-spec.js:102:19
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/zombienet/dist/providers/k8s/chain-spec.js:5:58)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Verifying extrinsic success

Currently we have this wait_for_events.js script, which seems to not always get all the events.
We probably need a more resilient way of waiting for the extrinsics to be finalized.
This is important to make sure that there are no errors in our setup that produce invalid extrinsics.

The hotfix for now is to sleep 60s before calling the script.

Pass in para_id argument to tps

Right now when deploying TPS in Versi, the relay client gets all CandidateIncluded events for all parachains. This means it may send a para_head to a Parachain client which can't find the block associated with the hash. To avoid this, the tps binary must only pass para_heads from CandidateIncluded events for the specific para_id it is subscribing to.

Setting both --genesis and --para-finality for `tps`

Currently, the --para-finality option was provided to deploy tps in Versi. Versi is a long-living network, hence an sTPS deployment (i.e. of a set of senders per tps binary) would not start at genesis. Therefore, passing--para-finality and --genesis together currently does not work, as the logic currently would attempt to calculate the parablock time by comparing timestamps of the first parablock, and the (non-existing) parablock before the first block. This leads to an unwrap() being called on a None, which results in a panic!.

This should be fixed so zombienet can be used to monitor parablock TPS.

Generate pre-funded accounts in Polkadot format

Currently, the funder module generates pre-funded addresses in generic Substrate format (5...).

As soon as

"chain": "rococo-local",
is switched to polkadot-dev, the setup breaks with:

thread 'main' panicked at 'Account has insufficient funds', src/pre/mod.rs:49:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
rust process exited with code 101
...
Error: Rpc(Request("{\"jsonrpc\":\"2.0\",\"error\":{\"code\":1010,\"message\":\"Invalid Transaction\",\"data\":\"Inability to pay some fees (e.g. account balance too low)\"},\"id\":4}"))
rust process exited with code 1

In order to fulfil #15, the funder module needs to be adapted so it generates addresses in Polkadot format (1...).

Zombienet k8 Cluster for sTPS measurements

We are attempting to use Zombienet as the main network orchestration tool for collecting sTPS measurements. The idea is to use a Kubernetes cluster that isolates one node per bare-metal machine. That means we need a new Kubernetes Cluster in Parity's infrastructure.

The machines should be as close as possible to what is currently used for the Polkadot Validator infrastructure, and we need to document them in the methodology docs.

We are starting with simple setups, where 10 machines should be enough. But as the setups scale up, that number should also increase in the near future.

cc @pepoviola

Make the sender parallel

This would make the Versi deployments better since we don't need to coordinate between Pods and rely upon the Kubernetes scheduler, but in stead can coordinate using the multi-threaded tokio runtime in one process.

Investigate why transactions aren't counted

When testing tps and sender locally against a natively running zombienet, I encountered an issue with the tps counter.

The tps binary was expecting more transactions than it counted, even if the sender successfully sent all transactions.
This appears to be because counting the transfers takes longer than 12s on average, since new CandidateIncluded events may happen, but I am not entirely certain. It could also be because some transactions are dropped.

In principle, with the current logic, all transfers are counted before the next Candidate is processed. If counting 1094 transfers takes longer than 12s I would be quite surprised since it seems to work just fine when submitting fewer transactions.

Below are the logs when submitting 10,000 transactions from 1 sender. It should be noted this problem does not occur when sending 1000, 2000, or 3000 transactions, so I doubt it has anything to do with the calculation speed, but rather is likely because of dropped transactions. The sender also did not spend more than roughly 1s to send all the 10,000 transactions.

The reason why this is an issue is because when running these parachain TPS tests with zombienet, the tps binary is not receiving an exit signal, meaning the zombienet .dsl tests cannot pass, which means it would not be possible to have high number of transactions submitted in CI.

Yet, this is not really a fix as it seems to be of larger concern related to the memory pool and the lifetime of the extrinsics given that parablock numbers increment 1 by 1.

[2023-06-09T16:41:24Z DEBUG tps] New ParaHead: 0x275496df43e4dbff026ff0ff514029dc1423065f5f17ede1d87c721a35077dc5 for ParaId: Id(1000)
[2023-06-09T16:41:24Z DEBUG tps] Received ParaHead: 0x275496df43e4dbff026ff0ff514029dc1423065f5f17ede1d87c721a35077dc5
[2023-06-09T16:41:24Z DEBUG tps] Parablock time estimated at: 12020ms
[2023-06-09T16:41:24Z DEBUG tps] Checking extrinsics in parablock: 7
[2023-06-09T16:41:30Z DEBUG tps] Found 1094 transfers in parablock: 7
[2023-06-09T16:41:30Z INFO  tps] TPS on parablock 7: 91.01497
[2023-06-09T16:41:30Z INFO  tps] Total transactions processed: 6390
[2023-06-09T16:41:30Z DEBUG tps] Remaining transactions to process: 3610
[2023-06-09T16:41:36Z DEBUG tps] New ParaHead: 0xcdfe41469f2d3fa5f8ab649c16eb06fbbf72150377a5fe887a5c976b23c1f0d6 for ParaId: Id(1000)
[2023-06-09T16:41:36Z DEBUG tps] Received ParaHead: 0xcdfe41469f2d3fa5f8ab649c16eb06fbbf72150377a5fe887a5c976b23c1f0d6
[2023-06-09T16:41:36Z DEBUG tps] Parablock time estimated at: 11982ms
[2023-06-09T16:41:36Z DEBUG tps] Checking extrinsics in parablock: 8
[2023-06-09T16:41:38Z DEBUG tps] Found 708 transfers in parablock: 8
[2023-06-09T16:41:38Z INFO  tps] TPS on parablock 8: 59.08863
[2023-06-09T16:41:38Z INFO  tps] Total transactions processed: 7098
[2023-06-09T16:41:38Z DEBUG tps] Remaining transactions to process: 2902
[2023-06-09T16:41:48Z DEBUG tps] New ParaHead: 0x0c19adca9df721dd37263b707c39d031272547ed3ce8ca4ff582417065996866 for ParaId: Id(1000)
[2023-06-09T16:41:48Z DEBUG tps] Received ParaHead: 0x0c19adca9df721dd37263b707c39d031272547ed3ce8ca4ff582417065996866
[2023-06-09T16:41:48Z DEBUG tps] Parablock time estimated at: 12021ms
[2023-06-09T16:41:48Z DEBUG tps] Checking extrinsics in parablock: 9
[2023-06-09T16:41:48Z DEBUG tps] Found 0 transfers in parablock: 9

As can be seen in the logs there was a total of 2909 transactions remaining to be calculated for ParaId: Id(1000). After this last candidate with transfers was checked, no further candidates included any future transfers.

config-js test fails

@pepoviola I copied these three files from the zombienet tests directory:

When I try to invoke the main script, I run into the following error message:

$ ./zombienet.sh test tests/examples/0008-custom-js.feature 


  custom js( System Events Custom JS )
	 Launching network... this can take a while.

	 Using provider: native

	 Launching network under namespace: zombie-aaf74141e26f46f042609a2b0f64f763
		 Using temporary directory: /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW

	launching temp
		 with command: bash -c ./polkadot build-spec --chain rococo-local --disable-default-bootnode > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/cfg/rococo-local-plain.json
		temp is ready!

		๐Ÿงน Starting with a fresh authority set...
			  ๐Ÿ‘ค Added Genesis Authority alice - 5CrY25p7g85soBZyxBNRHh3KU6WpcpyGJdYSfwqbkRVbFrbq
			  ๐Ÿ‘ค Added Genesis Authority bob - 5HR2hzzJi4V3VKKsVr5HSv6C3d1TuuBw45FZfPTxjAh1LSpo

	launching temp-1
		 with command: bash -c target/release/parachain-collator build-spec --disable-default-bootnode > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/cfg/rococo-local-plain.json
		temp-1 is ready!

	launching temp-2
		 with command: bash -c target/release/parachain-collator build-spec --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-100-plain.json --disable-default-bootnode  --raw > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-100-raw.json
		temp-2 is ready!

	launching temp-collator
		 with command: bash -c target/release/parachain-collator export-genesis-state --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-100.json > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/cfg/genesis-state && target/release/parachain-collator export-genesis-wasm --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-100.json > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/cfg/genesis-wasm
		temp-collator is ready!

		  โœ“ Added Genesis Parachain 100

	launching temp-3
		 with command: bash -c ./polkadot build-spec --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-plain.json --disable-default-bootnode  --raw > /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/rococo-local-raw.json
		temp-3 is ready!

		 Chain name: Rococo Local Testnet

		 โš™ Clear Boot Nodes

	launching alice
		 with command: ./polkadot --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/alice/cfg/rococo-local.json --name alice --rpc-cors all --unsafe-rpc-external --rpc-methods unsafe --unsafe-ws-external --alice --no-mdns --node-key 2bd806c97f0e00af1a1fc3328fa763a9269723c8db8fac4f93af71db186d6e90 --no-telemetry --prometheus-external --validator --prometheus-port 45263 --rpc-port 46555 --ws-port 37935 --listen-addr /ip4/0.0.0.0/tcp/40677/ws --base-path /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/alice/data
		alice is ready!
	alice running

		 You can follow the logs of the node by running this command: 

			 tail -f  /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/alice.log

		 โš™ Added Boot Nodes:  /ip4/127.0.0.1/tcp/40677/ws/p2p/12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm

	launching bob
		 with command: ./polkadot --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/bob/cfg/rococo-local.json --name bob --rpc-cors all --unsafe-rpc-external --rpc-methods unsafe --unsafe-ws-external --bob --no-mdns --node-key 81b637d8fcd2c6da6359e6963113a1170de795e4b725b84d1e0b4cfd9ec58ce9 --no-telemetry --prometheus-external --validator --bootnodes /ip4/127.0.0.1/tcp/40677/ws/p2p/12D3KooWQCkBm1BYtkHpocxCwMgR8yjitEeHGx8spzcDLGt2gkBm --prometheus-port 44659 --rpc-port 43675 --ws-port 40441 --listen-addr /ip4/0.0.0.0/tcp/40661/ws --base-path /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/bob/data
		bob is ready!
	bob running

		 You can follow the logs of the node by running this command: 

			 tail -f  /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/bob.log
	 All relay chain nodes spawned...

	launching collator01
		 with command: target/release/parachain-collator --name collator01 --alice --collator --force-authoring --chain /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/collator01/cfg/rococo-local-100.json --base-path /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/collator01/data --listen-addr /ip4/0.0.0.0/tcp/34097/ws --ws-port 36809
Error: Timeout(30) for node : collator01
    at NativeClient.<anonymous> (/snapshot/zombienet/dist/providers/native/nativeClient.js:267:19)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/zombienet/dist/providers/native/nativeClient.js:5:58)

	 Node's logs are available in /tmp/zombie-aaf74141e26f46f042609a2b0f64f763_-6254-S2zJxzvfr3OW/logs

Is there some parameter missing from the config files?

Double check the configs

Writing down some points before I forget them again:

  • Use ParityDB instead of RocksDB for more performance
  • Use polkadot-dev instead of rococo for correct weights
  • Ensure use of wasm-compiled executor (should be the default now AFAIK) --wasm-execution=compiled --execution=wasm

Manually set the nonce for signing

signer.set_nonce should be used since otherwise it will request the last nonce from the RPC.
Just putting it here as to not forget it.

Error: Rpc(Transport(Error in the WebSocket handshake: i/o error: unexpected end of file

while testing on kubernetes mode, I am facing the following errors:

$ ./zombienet-linux -p kubernetes test tests/stps/relay.feature
...
    โœ” alice: is up within 600 secs (73ms)
[2022-05-25T18:53:22Z INFO  utils] Checking sTPS pre-conditions (account nonces and free balances).
rust process exited with code 0
    โœ” alice: js-script ./utils.js with "check_pre_conditions" return is 0 within 600 secs (533ms)
[2022-05-25T18:53:22Z INFO  utils] Node 0: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-05-25T18:53:22Z INFO  utils] Node 2: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-05-25T18:53:23Z INFO  utils] Node 1: Reading funded accounts from: "tests/stps/funded-accounts.json"
[2022-05-25T18:53:23Z INFO  utils] Node 3: Reading funded accounts from: "tests/stps/funded-accounts.json"
Error: Rpc(Transport(Error in the WebSocket handshake: i/o error: unexpected end of file

Caused by:
    0: i/o error: unexpected end of file
    1: unexpected end of file))
[2022-05-25T18:53:23Z INFO  utils::sender] Node 0: signing 4096 transactions
rust process exited with code 1
child process exited
[2022-05-25T18:53:23Z INFO  utils::sender] Node 2: signing 4096 transactions
[2022-05-25T18:53:23Z INFO  utils::sender] Node 1: signing 4096 transactions
[2022-05-25T18:54:14Z INFO  utils::sender] Node 0: sending 4096 transactions in chunks of 50
[2022-05-25T18:54:15Z INFO  utils::sender] Node 1: sending 4096 transactions in chunks of 50
[2022-05-25T18:54:15Z INFO  utils::sender] Node 2: sending 4096 transactions in chunks of 50
[2022-05-25T18:54:15Z INFO  utils::sender] Node 0: 350 txs sent in 1172 ms (298.38 /s)
[2022-05-25T18:54:16Z INFO  utils::sender] Node 1: 300 txs sent in 1060 ms (283.01 /s)
[2022-05-25T18:54:16Z INFO  utils::sender] Node 2: 200 txs sent in 1409 ms (141.94 /s)
[2022-05-25T18:54:17Z INFO  utils::sender] Node 1: 600 txs sent in 1035 ms (579.69 /s)
[2022-05-25T18:54:17Z INFO  utils::sender] Node 2: 400 txs sent in 1069 ms (374.12 /s)

A few patterns that caught my attention:

  • this does not happen on native mode.
  • the error message happens right after utils starts reading funded accounts from: tests/stps/funded-accounts.json, although there's no RPC call in this process... it's just loading a json file locally.
  • the error message happens before any utils::sender starts signing the transactions to be dispatched, which is way before any actual RPC call happens.
  • there's one utils::sender per node. As I increase the number of nodes in the setup, it's always the last utils::sender that dies (e.g.: 5 nodes, no txs are dispatched for node 4)

@niklasad1 does this look like anything related to jsonrpsee to you?
all the points listed above tell me it's not, but maybe your experienced eyes can rule this out from a different perspecitve.

cc @ggwpez @pepoviola

Block finalisation checks

Currently, when deploying sTPS to measure TPS on a parachain, we are assuming that the parachain throughput can be estimated under the assumption that the next relay-chain block includes the parachain candidate containing the Transfer events submitted by the sender to the collator. However, we are not checking for parablock finality when calculating tps.

We should change tps to optionally guarantee that the parachain blocks are in fact included in the relay chain when measuring TPS parachain side.

Custom genesis with pre-funded accounts

One of the requirements for the sTPS benchmark is that neither sender nor receiver accounts have been touched so far.
This implies that we need a custom genesis block. I generated some in the past with 1M pre-funded accounts and will try it out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.