Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
Here we add a beacon chain light client for use by geth.
Geth can now be configured to run against a beacon chain API endpoint,
without pointing a CL to it. To set this up, use the `--beacon.api` flag. Information
provided by the beacon chain is verified, i.e. geth does not blindly trust the beacon
API endpoint in this mode. The root of trust are the beacon chain 'sync committees'.
The configured beacon API endpoint must provide light client data. At this time, only
Lodestar and Nimbus provide the necessary APIs.
There is also a standalone tool, cmd/blsync, which uses the beacon chain light client
to drive any EL implementation via its engine API.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This raises the JSON-RPC batch request limits significantly for the engine API endpoint.
The limits are now also hard-coded, so users won't get them wrong. I have chosen these limits:
maximum batch items: 2000
maximum batch response size: 250MB
While it would also be possible to disable batch limits completely for the engine API,
I think having some limits is a good safety net against misbehaving CLs. Since this
isn't configurable, we really want to ensure this limit will never become an issue in the
CL/EL communication, so I set them quite high.
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR adds server-side limits for JSON-RPC batch requests. Before this change, batches
were limited only by processing time. The server would pick calls from the batch and
answer them until the response timeout occurred, then stop processing the remaining batch
items.
Here, we are adding two additional limits which can be configured:
- the 'item limit': batches can have at most N items
- the 'response size limit': batches can contain at most X response bytes
These limits are optional in package rpc. In Geth, we set a default limit of 1000 items
and 25MB response size.
When a batch goes over the limit, an error response is returned to the client. However,
doing this correctly isn't always possible. In JSON-RPC, only method calls with a valid
`id` can be responded to. Since batches may also contain non-call messages or
notifications, the best effort thing we can do to report an error with the batch itself is
reporting the limit violation as an error for the first method call in the batch. If a batch is
too large, but contains only notifications and responses, the error will be reported with
a null `id`.
The RPC client was also changed so it can deal with errors resulting from too large
batches. An older client connected to the server code in this PR could get stuck
until the request timeout occurred when the batch is too large. **Upgrading to a version
of the RPC client containing this change is strongly recommended to avoid timeout issues.**
For some weird reason, when writing the original client implementation, @fjl worked off of
the assumption that responses could be distributed across batches arbitrarily. So for a
batch request containing requests `[A B C]`, the server could respond with `[A B C]` but
also with `[A B] [C]` or even `[A] [B] [C]` and it wouldn't make a difference to the
client.
So in the implementation of BatchCallContext, the client waited for all requests in the
batch individually. If the server didn't respond to some of the requests in the batch, the
client would eventually just time out (if a context was used).
With the addition of batch limits into the server, we anticipate that people will hit this
kind of error way more often. To handle this properly, the client now waits for a single
response batch and expects it to contain all responses to the requests.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
Makes the `geth account ... ` commands usable even if a geth-process is already executing, since the account commands do not read the chaindata, it was not required for those to use the same locking mechanism.
---
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
* cmd/utils, node: switch to Pebble as the default db if none exists
* node: fall back to LevelDB on platforms not supporting Pebble
* core/rawdb, node: default to Pebble at the node level
* cmd/geth: fix some tests explicitly using leveldb
* ethdb/pebble: allow double closes, makes tests simpler
* eth: cmd: deprecate personal namespace
* eth: cmd: move deprecation to node
* node: disable toml of enablepersonal
* node: disable personal on ipc as well
* Update node/node.go
Co-authored-by: Martin Holst Swende <martin@swende.se>
* console: error -> warn
* node: less roulette
---------
Co-authored-by: Martin Holst Swende <martin@swende.se>
Here we add special handling for sending an error response when the write timeout of the
HTTP server is just about to expire. This is surprisingly difficult to get right, since is
must be ensured that all output is fully flushed in time, which needs support from
multiple levels of the RPC handler stack:
The timeout response can't use chunked transfer-encoding because there is no way to write
the final terminating chunk. net/http writes it when the topmost handler returns, but the
timeout will already be over by the time that happens. We decided to disable chunked
encoding by setting content-length explicitly.
Gzip compression must also be disabled for timeout responses because we don't know the
true content-length before compressing all output, i.e. compression would reintroduce
chunked transfer-encoding.
This changes the node setup to ignore datadir files
static-nodes.json
trusted-nodes.json
When these files are present, it an error will be printed to the log.
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.
In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.
With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
This adds a generic mechanism for 'dial options' in the RPC client,
and also implements a specific dial option for the JWT authentication
mechanism used by the engine API. Some real tests for the server-side
authentication handling are also added.
Co-authored-by: Joshua Gutow <jgutow@optimism.io>
Co-authored-by: Felix Lange <fjl@twurst.com>
This change makes http.Server.ReadHeaderTimeout configurable separately
from ReadTimeout for RPC servers. The default is set to the same as
ReadTimeout, which in order to cause no change in existing deployments.
This change ensures the HTTP server will always terminate within
at most 5s, even when all connections are busy and do not become
idle.
Co-authored-by: Felix Lange <fjl@twurst.com>
This enables the following linters
- typecheck
- unused
- staticcheck
- bidichk
- durationcheck
- exportloopref
- gosec
WIth a few exceptions.
- We use a deprecated protobuf in trezor. I didn't want to mess with that, since I cannot meaningfully test any changes there.
- The deprecated TypeMux is used in a few places still, so the warning for it is silenced for now.
- Using string type in context.WithValue is apparently wrong, one should use a custom type, to prevent collisions between different places in the hierarchy of callers. That should be fixed at some point, but may require some attention.
- The warnings for using weak random generator are squashed, since we use a lot of random without need for cryptographic guarantees.
This commit replaces ioutil.TempDir with t.TempDir in tests. The
directory created by t.TempDir is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using ioutil.TempDir
had to be removed manually by calling os.RemoveAll, which is omitted in
some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but t.TempDir handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>