Hi all,
As more of our sharding client code is being created in our fork, it is critical to understand the design considerations of the current Ethereum nodes baked into go-ethereum. In particular, our notary/proposer clients need to be designed with good event loop management, pluggable services, and solid entry points for p2p functionality built in. As a case study, we will be looking at lightsync nodes as they are currently implemented in geth, understand their full responsibilities, and figure out the bigger picture behind the design considerations of their architecture.
The key question we will be asking ourselves is: what exactly happens when we start a light client? What are the design considerations that came into play when designing the code that gets the light client to work?
We will cap off this document by determining what aspects of the protocols in geth we can use as part of our sharding clients. We have an opportunity to write clean, straightforward code that does not have a massive number of file dependencies and complicated configs as geth currently does.
Let’s dive in.
Case Study: Light Client Nodes
Ethereum’s light client sync mode allows users to spin up a geth node that only downloads block headers and relies on merkle proofs to verify specific parts of the state tree as needed. Light peers are extremely commonplace and critical components in the Ethereum network today. Their architecture serves as a great starting point for anyone extending or redesigning geth in a secure, concurrent, and performant way.
Unfortunately, the current geth code is very hard to read, has a ton of dependencies across packages, and contains obscure configuration options. This doc will attempt to explain light client sync from start to finish, light node peer-to-peer networking, and other responsibilities of the protocol.
How is a Light Node Triggered?
Launching a geth light node is as easy as:
$ geth --syncmode="light"
Upon the command being executed, the main function within go-ethereum/cmd/geth/main.go
runs as follows:
func main() {
if err := app.Run(os.Args); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
}
This triggers the urfave/cli external package’s Run function, which will trigger the geth function a few lines below main().
func geth(ctx *cli.Context) error {
node := makeFullNode(ctx)
startNode(ctx, node)
node.Wait()
return nil
}
Based on the cli context, this function initializes a node
instance, which is a critical entry point. Let’s take a look at how makeFullNode
does this.
In go-ethereum/cmd/geth/config.go
:
func makeFullNode(ctx _cli.Context) _node.Node {
stack, cfg := makeConfigNode(ctx)
utils.RegisterEthService(stack, &cfg.Eth)
// a bunch of other services are configured below…
…
// then it returns the node, which is a var called a “stack”,
// representing a protocol stack of the node (i.e. p2p services, rpc, etc.).
return stack
}
Two important functions are at play here:
makeConfigNode
returns a configuration object that uses the cli context to fetch relevant command line flags and returns a node instance + a configuration object instance.
utils.RegisterEthService
is a function that, based on the command line flags from the context, will use configuration options to add a Service
object to the node instance we just declared above. In this case, the cli context contains the --syncmode="light"
flag that we will be using to setup a light client protocol instead of a full Ethereum node.
Let's see makeConfigNode
in go-ethereum/cmd/geth/config.go
:
func makeConfigNode(ctx _cli.Context) (_node.Node, gethConfig) {
// Load defaults.
cfg := gethConfig{
Eth: eth.DefaultConfig,
Shh: whisper.DefaultConfig,
Node: defaultNodeConfig(),
Dashboard: dashboard.DefaultConfig
}
// Load config file.
if file := ctx.GlobalString(configFileFlag.Name); file != "" {
if err := loadConfig(file, &cfg); err != nil {
utils.Fatalf("%v", err)
}
}
// Apply flags.
utils.SetNodeConfig(ctx, &cfg.Node)
stack, err := node.New(&cfg.Node)
if err != nil {
utils.Fatalf("Failed to create the protocol stack: %v", err)
}
utils.SetEthConfig(ctx, stack, &cfg.Eth)
if ctx.GlobalIsSet(utils.EthStatsURLFlag.Name) {
cfg.Ethstats.URL = ctx.GlobalString(utils.EthStatsURLFlag.Name)
}
utils.SetShhConfig(ctx, stack, &cfg.Shh)
utils.SetDashboardConfig(ctx, &cfg.Dashboard)
return stack, cfg
}
Cool, so this function just sets up some basic, default configurations to start a node. This sets up some basic, familiar options we have in the Ethereum network.
var DefaultConfig = Config{
SyncMode: downloader.FastSync,
Ethash: ethash.Config{
CacheDir: "ethash",
CachesInMem: 2,
CachesOnDisk: 3,
DatasetsInMem: 1,
DatasetsOnDisk: 2,
},
NetworkId: 1,
LightPeers: 100,
DatabaseCache: 768,
TrieCache: 256,
TrieTimeout: 5 _ time.Minute,
GasPrice: big.NewInt(18 _ params.Shannon),
TxPool: core.DefaultTxPoolConfig,
GPO: gasprice.Config{
Blocks: 20,
Percentile: 60,
},
}
The utils.SetEthConfig(ctx, stack, &cfg.Eth)
line is what will modify the cfg
option based on command line flags. In this case, if SyncMode
is set to light
, then the config is updated to reflect that flag. Then, we go into the actual code that initializes a Light Protocol instance and registers it as the node's ETH service.
In go-ethereum/cmd/flags.go
:
// RegisterEthService adds an Ethereum client to the stack.
func RegisterEthService(stack _node.Node, cfg _eth.Config) {
var err error
if cfg.SyncMode == downloader.LightSync {
err = stack.Register(func(ctx _node.ServiceContext) (node.Service, error) {
return les.New(ctx, cfg)
})
} else {
err = stack.Register(func(ctx _node.ServiceContext) (node.Service, error) {
fullNode, err := eth.New(ctx, cfg)
if fullNode != nil && cfg.LightServ > 0 {
ls, \_ := les.NewLesServer(fullNode, cfg)
fullNode.AddLesServer(ls)
}
return fullNode, err
})
}
if err != nil {
Fatalf("Failed to register the Ethereum service: %v", err)
}
}
So here, if the config option for the downloader is set to LightSync
, which was set in the makeConfigNode
function we saw before, we register a Service
object into the node (referred to as stack in the code above). Nodes contain an array of Service
instances that all implement useful functions we will come back to later. In this case, the service a LightEthereum
instance that gives us all the functionality we need to run a light client.
How Do These Attached Services Start Running?
Here's where everything actually ties together. If you go back to the main
function in go-ethereum/cmd/geth/main.go
,
func geth(ctx *cli.Context) error {
node := makeFullNode(ctx)
startNode(ctx, node)
node.Wait()
return nil
}
the startNode
func actually kicks things off.
// startNode boots up the system node and all registered protocols, after which
// it unlocks any requested accounts, and starts the RPC/IPC interfaces and the
// miner.
func startNode(ctx _cli.Context, stack _node.Node) {
// Start up the node itself
utils.StartNode(stack)
// a lot of stuff below is related to wallet opening/closing events and setting up
// full node mining functionality...
...
}
When we look at utils.StartNode
in go-ethereum/cmd/utils/cmd.go
:
func StartNode(stack *node.Node) {
if err := stack.Start(); err != nil {
Fatalf("Error starting protocol stack: %v", err)
}
// stuff below handles signal interrupts to stop the service...
...
}
...we see the actual code that starts off a node! Let's explore. In go-ethereum/node/node.go
, a lot of things happen (simplified for readability):
func (n *Node) Start() error {
n.lock.Lock()
defer n.lock.Unlock()
// Short circuit if the node's already running
if n.server != nil {
return ErrNodeRunning
}
if err := n.openDataDir(); err != nil {
return err
}
// Initialize the p2p server. This creates the node key and
// discovery databases.
n.serverConfig = n.config.P2P
n.serverConfig.PrivateKey = n.config.NodeKey()
n.serverConfig.Name = n.config.NodeName()
n.serverConfig.Logger = n.log
// setting up more config stuff...
...
// sets up a peer to peer server instance!
running := &p2p.Server{Config: n.serverConfig}
n.log.Info("Starting peer-to-peer node", "instance", n.serverConfig.Name)
services := make(map[reflect.Type]Service)
// serviceFuncs is an internal slice updated in a node whenever node.Register() is called!
for _, constructor := range n.serviceFuncs {
// Create a new context for the particular service
ctx := &ServiceContext{
config: n.config,
services: make(map[reflect.Type]Service),
EventMux: n.eventmux,
AccountManager: n.accman,
}
// does some stuff for threaded access...
...
// Construct and save the service
service, err := constructor(ctx)
// sets up the service and adds it to the services slice defined above...
...
// updates the services slice
services[kind] = service
}
// this uses the .Protocols() property of each attached service (yes, LightEthereum has this defined)
// and attaches it to the running p2p server instance.
for _, service := range services {
running.Protocols = append(running.Protocols, service.Protocols()...)
}
// this starts the p2p server!
if err := running.Start(); err != nil {
...
}
// Start each of the services
for kind, service := range services {
// Start the next service, stopping all previous upon failure
if err := service.Start(running); err != nil {
...
}
}
// code below starts some RPC stuff and cleans up the node when it exits...
return nil
}
Aha! So this is the function that iterates over each attached service and runs the .Start()
function for each! The LightEthereum
instance that was attached as a service to the node implements the Service
interface that contains a .Start()
function. This is how it all fits together!
The Light Ethereum Package
We will focusing our attention on the go-ethereum/les
package in this section, as this is the service that is attached to the running node upon launching a geth instance with the --syncmode="light"
flag.
The light client needs to implement the Service
interface defined in go-ethereum/node/service.go
as follows:
type Service interface {
// Protocols retrieves the P2P protocols the service wishes to start.
Protocols() []p2p.Protocol
// APIs retrieves the list of RPC descriptors the service provides.
APIs() []rpc.API
// Start is called after all services have been constructed and the networking
// layer was also initialized to spawn any goroutines required by the service.
Start(server *p2p.Server) error
// Stop terminates all goroutines belonging to the service, blocking until they
// are all terminated.
Stop() error
}
The core of the entire light client is written in go-ethereum/les/backend.go
. This is where we find the functions required to satisfy this Service
interface, alongside the code that initializes an actual LightEthereum
instance in a function known called New
.
func New(ctx _node.ServiceContext, config _eth.Config) (_LightEthereum, error) {
// sets up the chainDB and genesis configuration for the light node...
chainDb, err := eth.CreateDB(ctx, config, "lightchaindata")
if err != nil {
return nil, err
}
chainConfig, genesisHash, genesisErr := core.SetupGenesisBlock(chainDb, config.Genesis)
...
log.Info("Initialised chain configuration", "config", chainConfig)
leth := &LightEthereum{
...
}
// sets up a transaction relayer, a server pool, and info retrieval systems
leth.relay = NewLesTxRelay(peers, leth.reqDist)
leth.serverPool = newServerPool(chainDb, quitSync, &leth.wg)
leth.retriever = newRetrieveManager(peers, leth.reqDist, leth.serverPool)
...
// sets up the light tx pool
leth.txPool = light.NewTxPool(leth.chainConfig, leth.blockchain, leth.relay)
// sets up a protocol manager: we'll get into this shortly...
if leth.protocolManager, err = NewProtocolManager(...); err != nil {
return nil, err
}
// sets up the light ethereum APIs for RPC interactions
leth.ApiBackend = &LesApiBackend{leth, nil}
...
return leth, nil
}
Let's see what the light client's .Start()
function does and how it sets up the p2p stack:
func (s _LightEthereum) Start(srvr _p2p.Server) error {
...
log.Warn("Light client mode is an experimental feature")
s.netRPCService = ethapi.NewPublicNetAPI(srvr, s.networkId)
...
s.serverPool.start(srvr, lesTopic(s.blockchain.Genesis().Hash(), protocolVersion))
...
return nil
}
Light Protocol Event Loop
The creation of the LightEthereum
instance kicks off a bunch of goroutines, but where the actual sync and retrieval of state occurs is in the creation of a ProtocolManager
in the New
function.
In go-ethereum/les/handler.go
, we see at the bottom of the NewProtocolManager
function, code that runs some event loops:
if lightSync {
manager.downloader = downloader.New(downloader.LightSync, chainDb, manager.eventMux, nil, blockchain, removePeer)
manager.peers.notify((*downloaderPeerNotify)(manager))
manager.fetcher = newLightFetcher(manager)
}
In this case, we the instance starts a new downloader
instance and a newLightFetcher
, which work in tandem with the p2p layer to sync the state and respond to RPC requests that trigger events on peers or respond to incoming messages from peers.
The implementation diverges into a variety of files at this point, but an important aspect of the les
package is the usage of on-demand requests or ODR's. Through the p2p light server, nodes receive requests that are processed via goroutines such as in the example below.
In go-ethereum/les/odr_requests.go
:
func (r _TrieRequest) Validate(db ethdb.Database, msg _Msg) error {
log.Debug("Validating trie proof", "root", r.Id.Root, "key", r.Key)
switch msg.MsgType {
case MsgProofsV1:
proofs := msg.Obj.([]light.NodeList)
if len(proofs) != 1 {
return errInvalidEntryCount
}
nodeSet := proofs[0].NodeSet()
// Verify the proof and store if checks out
if _, err, _ := trie.VerifyProof(r.Id.Root, r.Key, nodeSet); err != nil {
return fmt.Errorf("merkle proof verification failed: %v", err)
}
r.Proof = nodeSet
return nil
case MsgProofsV2:
proofs := msg.Obj.(light.NodeList)
// Verify the proof and store if checks out
nodeSet := proofs.NodeSet()
reads := &readTraceDB{db: nodeSet}
if _, err, _ := trie.VerifyProof(r.Id.Root, r.Key, reads); err != nil {
return fmt.Errorf("merkle proof verification failed: %v", err)
}
// check if all nodes have been read by VerifyProof
if len(reads.reads) != nodeSet.KeyCount() {
return errUselessNodes
}
r.Proof = nodeSet
return nil
default:
return errInvalidMessageType
}
}
The node in question has the capacity to immediately respond to a message received via other peers, which is a critical piece of functionality we will need the more we elaborate on our notary/proposer clients.
Key Takeaways
Overall, taking full advantage of Go's concurrency primitives along with mutexes for managing services is a great benefit of working with the geth client. We should maintain the pluggability of Services
via a Service
-like interface and allow for easy management and testing of relevant code.
What we should avoid, however, is the extremely dependent spaghetti code around configuration options. There is a lot of hetereogeneity around configuring structs in the geth client, with packages often following their own approaches compared to others throughout the project. We should aim to constrain all configuration to a single, initial entrypoint and avoid redundancy of .Start()
methods. After reading this code, it often feels like the geth team really drove themselves into a corner here. We have the opportunity to keep things simple, DRY, and performant.
We have to leverage the powerful constructs shown above in our notary/proposer implementations to make the most out of Go. Please let me know your thoughts below as to how we can improve upon what the go-ethereum
team has done.
Let's go for it.