This change significantly improves the performance of RLPx message reads
and writes. In the previous implementation, reading and writing of
message frames performed multiple reads and writes on the underlying
network connection, and allocated a new []byte buffer for every read.
In the new implementation, reads and writes re-use buffers, and perform
much fewer system calls on the underlying connection. This doubles the
theoretically achievable throughput on a single connection, as shown by
the benchmark result:
name old speed new speed delta
Throughput-8 70.3MB/s ± 0% 155.4MB/s ± 0% +121.11% (p=0.000 n=9+8)
The change also removes support for the legacy, pre-EIP-8 handshake encoding.
As of May 2021, no actively maintained client sends this format.
This removes the error log message that says
Ethereum peer removal failed ... err="peer not registered"
The error happened because removePeer was called multiple
times: once to disconnect the peer, and another time when the
handler exited. With this change, removePeer now has the sole
purpose of disconnecting the peer. Unregistering happens exactly
once, when the handler exits.
This change extracts the peer QoS tracking logic from eth/downloader, moving
it into the new package p2p/msgrate. The job of msgrate.Tracker is determining
suitable timeout values and request sizes per peer.
The snap sync scheduler now uses msgrate.Tracker instead of the hard-coded 15s
timeout. This should make the sync work better on network links with high latency.
This changes the definitions of Ping and Pong, adding an optional field
for the sequence number. This field was previously encoded/decoded using
the "tail" struct tag, but using "optional" is much nicer.
This removes auto-configuration of the snap.*.ethdisco.net DNS discovery tree.
Since measurements have shown that > 75% of nodes in all.*.ethdisco.net support
snap, we have decided to retire the dedicated index for snap and just use the eth
tree instead.
The dial iterators of eth and snap now use the same DNS tree in the default configuration,
so both iterators should use the same DNS discovery client instance. This ensures that
the record cache and rate limit are shared. Records will not be requested multiple times.
While testing the change, I noticed that duplicate DNS requests do happen even
when the client instance is shared. This is because the two iterators request the tree
root, link tree root, and first levels of the tree in lockstep. To avoid this problem, the
change also adds a singleflight.Group instance in the client. When one iterator
attempts to resolve an entry which is already being resolved, the singleflight object
waits for the existing resolve call to finish and returns the entry to both places.
When receiving PING from an IPv4 address over IPv6, the implementation sent
back a IPv4-in-IPv6 address. This change makes it reflect the IPv4 address.
* eth/protocols, prp/tracker: add support for req/rep rtt tracking
* p2p/tracker: sanity cap the number of pending requests
* pap/tracker: linter <3
* p2p/tracker: disable entire tracker if no metrics are enabled
This fixes the calculation of the tree branch factor. With the new
formula, we now creat at most 13 children instead of 30, ensuring
the TXT record size will be below 370 bytes.
This PR implements the first one of the "lespay" UDP queries which
is already useful in itself: the capacity query. The server pool is making
use of this query by doing a cheap UDP query to determine whether it is
worth starting the more expensive TCP connection process.
In the random sync algorithm used by the DNS node iterator, we first pick a random
tree and then perform one sync action on that tree. This happens in a loop until any
node is found. If no trees contain any nodes, the iterator will enter a hot loop spinning
at 100% CPU.
The fix is complicated. The iterator now checks if a meaningful sync action can
be performed on any tree. If there is nothing to do, it waits for the next root record
recheck time to arrive and then tries again.
Fixes#22306
Prevents a situation where we (not running snap) connects with a peer running snap, and get stalled waiting for snap registration to succeed (which will never happen), which cause a waitgroup wait to halt shutdown
This PR enables running the new discv5 protocol in both LES client
and server mode. In client mode it mixes discv5 and dnsdisc iterators
(if both are enabled) and filters incoming ENRs for "les" tag and fork ID.
The old p2p/discv5 package and all references to it are removed.
Co-authored-by: Felix Lange <fjl@twurst.com>
USB enumeration still occured. Make sure it will only occur if --usb is set.
This also deprecates the 'NoUSB' config file option in favor of a new option 'USB'.
The database panicked for invalid IPs. This is usually no problem
because all code paths leading to node DB access verify the IP, but it's
dangerous because improper validation can turn this panic into a DoS
vulnerability. The quick fix here is to just turn database accesses
using invalid IP into a noop. This isn't great, but I'm planning to
remove the node DB for discv5 long-term, so it should be fine to have
this quick fix for half a year.
Fixes#21849
This PR fixes a deadlock reported here: #21925
The cause is that many operations may be pending, but if the close happens, only one of them gets awoken and exits, the others remain waiting for a signal that never comes.
This fixes a deadlock that could occur when a response packet arrived
after a call had already received enough responses and was about to
signal completion to the dispatch loop.
Co-authored-by: Felix Lange <fjl@twurst.com>
- Remove the ws:// prefix from the status endpoint since
the ws:// is already included in the stack.WSEndpoint().
- Don't register the services again in the node start.
Registration is already done in the initialization stage.
- Expose admin namespace via websocket.
This namespace is necessary for connecting the peers via websocket.
- Offer logging relevant options for exec adapter.
It's really painful to mix all log output in the single console. So
this PR offers two additional options for exec adapter in this case
testers can config the log output(e.g. file output) and log level
for each p2p node.
This adds a few tiny fixes for les and the p2p simulation framework:
LES Parts
- Keep the LES-SERVER connection even it's non-synced
We had this idea to reject the connections in LES protocol if the les-server itself is
not synced. However, in LES protocol we will also receive the connection from another
les-server. In this case even the local node is not synced yet, we should keep the tcp
connection for other protocols(e.g. eth protocol).
- Don't count "invalid message" for non-existing GetBlockHeadersMsg request
In the eth syncing mechanism (full sync, fast sync, light sync), it will try to fetch
some non-existent blocks or headers(to ensure we indeed download all the missing chain).
In this case, it's possible that the les-server will receive the request for
non-existent headers. So don't count it as the "invalid message" for scheduling
dropping.
- Copy the announce object in the closure
Before the les-server pushes the latest headers to all connected clients, it will create
a closure and queue it in the underlying request scheduler. In some scenarios it's
problematic. E.g, in private networks, the block can be mined very fast. So before the
first closure is executed, we may already update the latest_announce object. So actually
the "announce" object we want to send is replaced.
The downsize is the client will receive two announces with the same td and then drop the
server.
P2P Simulation Framework
- Don't double register the protocol services in p2p-simulation "Start".
The protocols upon the devp2p are registered in the "New node stage". So don't reigster
them again when starting a node in the p2p simulation framework
- Add one more new config field "ExternalSigner", in order to use clef service in the
framework.
* peer: return localAddr instead of name to prevent spam
We currently use the name (which can be freely set by the peer) in several log messages.
This enables malicious actors to write spam into your geth log.
This commit returns the localAddr instead of the freely settable name.
* p2p: reduce usage of peer.Name in warn messages
* eth, p2p: use truncated names
* Update peer.go
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
For some reason, using the shared hash causes a cryptographic incompatibility
when using Go 1.15. I noticed this during the development of Discovery v5.1
when I added test vector verification.
The go library commit that broke this is golang/go@97240d5, but the
way we used HKDF is slightly dodgy anyway and it's not a regression.
This change moves the RLPx protocol implementation into a separate package,
p2p/rlpx. The new package can be used to establish RLPx connections for
protocol testing purposes.
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR adds an extra guarantee to NodeStateMachine: it ensures that all
immediate effects of a certain change are processed before any subsequent
effects of any of the immediate effects on the same node. In the original
version, if a cascaded change caused a subscription callback to be called
multiple times for the same node then these calls might have happened in a
wrong chronological order.
For example:
- a subscription to flag0 changes flag1 and flag2
- a subscription to flag1 changes flag3
- a subscription to flag1, flag2 and flag3 was called in the following order:
[flag1] -> [flag1, flag3]
[] -> [flag1]
[flag1, flag3] -> [flag1, flag2, flag3]
This happened because the tree of changes was traversed in a "depth-first
order". Now it is traversed in a "breadth-first order"; each node has a
FIFO queue for pending callbacks and each triggered subscription callback
is added to the end of the list. The already existing guarantees are
retained; no SetState or SetField returns until the callback queue of the
node is empty again. Just like before, it is the responsibility of the
state machine design to ensure that infinite state loops are not possible.
Multiple changes affecting the same node can still happen simultaneously;
in this case the changes can be interleaved in the FIFO of the node but the
correct order is still guaranteed.
A new unit test is also added to verify callback order in the above scenario.
This change improves discovery behavior in small networks. Very small
networks would often fail to bootstrap because all member nodes were
dropping table content due to findnode failure. The check is now changed
to avoid dropping nodes on findnode failure when their bucket is almost
empty. It also relaxes the liveness check requirement for FINDNODE/v4
response nodes, returning unverified nodes as results when there aren't
any verified nodes yet.
The "findnode failed" log now reports whether the node was dropped
instead of the number of results. The value of the "results" was
always zero by definition.
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds a lock around requests because some routers can't handle
concurrent requests. Requests are also rate-limited.
The Map function request a new mapping exactly when the map timeout
occurs instead of 5 minutes earlier. This should prevent duplicate mappings.
This PR significantly changes the APIs for instantiating Ethereum nodes in
a Go program. The new APIs are not backwards-compatible, but we feel that
this is made up for by the much simpler way of registering services on
node.Node. You can find more information and rationale in the design
document: https://gist.github.com/renaynay/5bec2de19fde66f4d04c535fd24f0775.
There is also a new feature in Node's Go API: it is now possible to
register arbitrary handlers on the user-facing HTTP server. In geth, this
facility is used to enable GraphQL.
There is a single minor change relevant for geth users in this PR: The
GraphQL API is no longer available separately from the JSON-RPC HTTP
server. If you want GraphQL, you need to enable it using the
./geth --http --graphql flag combination.
The --graphql.port and --graphql.addr flags are no longer available.