Git Product home page Git Product logo

Comments (14)

barnabasbusa avatar barnabasbusa commented on July 17, 2024

Nimbus reports doppelganger detection:

NTC 2024-04-05 14:40:50.011+00:00 Doppelganger detection active - skipping validator duties while observing the network topics="val_pool" validator=867e8956 slot=107 doppelCheck=none() activationEpoch=0
INF 2024-04-05 14:40:54.000+00:00 Slot start                                 slot=108 epoch=3 attestationIn=4s blockIn=1m validators=64 node_status=synced delay=171us45ns

from lodestar.

tersec avatar tersec commented on July 17, 2024

Is this just the doppelganger? That's intentional. Does --doppelganger-detection=off work?

from lodestar.

tersec avatar tersec commented on July 17, 2024

Nimbus doesn't use doppelganger protection on sync committees because they're not slashable.

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

@tersec regarding the getAggregate error, also reported in #6631, is Nimbus by any chance requesting an aggregate attestation even if no validator is aggregator? We had the same issue with Lighthouse previously and it was fixed on their end sigp/lighthouse#4712

See #6634 (comment)

from lodestar.

barnabasbusa avatar barnabasbusa commented on July 17, 2024

even with --doppelganger-detection=off I'm getting:

ERR 2024-04-05 18:03:31.097+00:00 Beacon node provides unexpected response   reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.31:8562[Lodestar/v1.17.0/f2ec0d4] node_index=0 node_roles=AGBSDT
WRN 2024-04-05 18:03:31.097+00:00 Unable to retrieve block data              reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=28 validator_index=78 service=block_service slot=28 validator=8890da28@78

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

even with --doppelganger-detection=off I'm getting:

ERR 2024-04-05 18:03:31.097+00:00 Beacon node provides unexpected response   reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.31:8562[Lodestar/v1.17.0/f2ec0d4] node_index=0 node_roles=AGBSDT
WRN 2024-04-05 18:03:31.097+00:00 Unable to retrieve block data              reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=28 validator_index=78 service=block_service slot=28 validator=8890da28@78

The "unexpected data" might be because we are returning execution_payload_source metadata field in the response as we allow to produce a blinded local block but this is field is not officially part of the block v3 api and was only proposed in ethereum/beacon-APIs#387 but not included.

See #6634 (comment)

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

The "unexpected data" might be because we are returning execution_payload_source metadata field in the response

After reviewing how Nimbus handles this case it does not seem to be the issue as it only uses metadata provided in headers which Lodestar is setting and only parses the response.data

beacon_chain/validator_client/api.nim#L2145

res = decodeBytes(ProduceBlockResponseV3, response.data, response.contentType, version, blinded, executionValue, consensusValue)

And even there they seem to allow unknown fields

beacon_chain/spec/eth2_apis/eth2_rest_serialization.nim#L3950-L3952

ok(RestJson.decode(value, T, requireAllFields = true, allowUnknownFields = true))

Based on just reviewing Lodestar and Nimbus code it's hard to tell what's the actual issue, would need to inspect the data we are sending the further investigate this

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

@tersec regarding the getAggregate error, also reported in #6631, is Nimbus by any chance requesting an aggregate attestation even if no validator is aggregator? We had the same issue with Lighthouse previously and it was fixed on their end sigp/lighthouse#4712

This is not the case, Nimbus will only request an aggregated attestation if at least one validator is aggregator

beacon_chain/validator_client/attestation_service.nim#L280

if len(aggregateItems) > 0:

But I think I know what the issue here is, Nimbus uses different strategies per API, and if I interpreted them correctly this is why we are seeing the error:

The submitPoolAttestations is only sent to the first / primary beacon node (unless an error is encountered)

beacon_chain/validator_client/attestation_service.nim#L66

await vc.submitPoolAttestations(@[attestation], ApiStrategyKind.First)

While the getAggregatedAttestation is sent to all connected beacon nodes and the "best" response is picked

beacon_chain/validator_client/attestation_service.nim#L283-L284

await vc.getAggregatedAttestation(slot, attestationRoot, ApiStrategyKind.Best)

But Lodestar might not have an aggregated attestation in it's cache if the validator client did not previously submit an attestation for attestation_data_root. Although this error might be not happen consistently as the beacon node might prepare the aggregate due to receiving the attestation via gossip (gossipHandlers.ts#L490-L492).

We have just improved the error handling if there is no aggregated attestation available #6648 to follow spec and produce a less nosiy error as this is somewhat expected to happen in a setup with multiple nodes.

If my analysis is correct, this is also not an issue as Nimbus would still produce the aggregate attestation if it received the data from the primary node.

@barnabasbusa this should not happen in a 1:1 setup without fallback node, I am assuming in your tests you had multiple bns connected to a single vc

from lodestar.

barnabasbusa avatar barnabasbusa commented on July 17, 2024

@nflaig We have 1:1 setups only. We haven't considered any testing for 1:x testing.

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

@nflaig We have 1:1 setups only. We haven't considered any testing for 1:x testing.

Then I have no idea why this happens but something to note is that there is also a very strange error which should never happen

Apr-05 14:20:00.472[rest]            error: Req req-r produceBlockV2 error - REGEN_ERROR_SLOT_BEFORE_BLOCK_SLOT
Error: REGEN_ERROR_SLOT_BEFORE_BLOCK_SLOT

So maybe there was just a clock issue / time skew between the validator client and beacon node?

from lodestar.

barnabasbusa avatar barnabasbusa commented on July 17, 2024

All tests are ran on the same (docker backend) machine, and everyone should be able to replicate it with the config I have posted above.
Doubt that there would be any sort of clock issues between two containers hosted on the same physical node.

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

and everyone should be able to replicate it with the config I have posted above.

Will have to do this next, thanks for the detailed infos

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

Apr-05 14:20:26.014[rest] error: Req req-5p getAggregatedAttestation error - No attestation for slot=5 dataRoot=0xd947df35713911cac72a977556a795d217eefc7a66bfded54347e18469c675dd
Error: No attestation for slot=5 dataRoot=0xd947df35713911cac72a977556a795d217eefc7a66bfded54347e18469c675dd

The aggregated attestation issue has been fixed by #6668. There still seems to be strange behavior by Nimbus VC that it requests an aggregated attestation for the first 1-5 slots even though it does not submit subnet subscriptions with is_aggregator=true for those slots beforehand. But other than that looks good, attestations, aggregates, and sync committee works as expected.

The only remaining issue is block production, my best guess right now is that Nimbus does not like that Lodestar returns a JSON payload from produceBlockV3 API.

from lodestar.

nflaig avatar nflaig commented on July 17, 2024

The only remaining issue is block production, my best guess right now is that Nimbus does not like that Lodestar returns a JSON payload from produceBlockV3 API.

Confirmed this issue is related us return a JSON payload in response

ERR 2024-05-29 13:37:55.640+00:00 Beacon node provides unexpected response   reason="Serialization error;200;produceBlockV3(best);unexpected-data" node=http://172.16.0.15:4000[Lodestar/v1.18.1/8b6ecc4] node_index=0 node_roles=AGBSDT
WRN 2024-05-29 13:37:55.645+00:00 Unable to retrieve block data              reason="Serialization error;200;produceBlockV3(best);unexpected-data" wall_slot=12 validator_index=53 service=block_service slot=12 validator=b09cb155@53

Block production with Nimbus VC will be fixed once we merge #6749 but there is another issue publishing blocks as SSZ (while JSON seems work fine there). Does not seem to be a problem with other clients and there is an issue on the Nimbus side to track this status-im/nimbus-eth2#6205 <-- this is an issue on Nimbus BN

from lodestar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.