This pull request fixes#30229.
During snap sync, large storage will be split into several pieces and
synchronized concurrently. Unfortunately, the tradeoff is that the respective
merkle trie of each storage chunk will be incomplete due to the incomplete
boundaries. The trie nodes on these boundaries will be discarded, and any
dangling nodes on disk will also be removed if they fall on these paths,
ensuring the state healer won't be blocked.
However, the dangling account trie nodes on the path from the root to the
associated account are left untouched. This means the dangling account trie
nodes could potentially stop the state healing and break the assumption that the
entire subtrie should exist if the subtrie root exists. We should consider the
account trie node as the ancestor of the corresponding storage trie node.
In the scenarios described in the above ticket, the state corruption could occur
if there is a dangling account trie node while some storage trie nodes are
removed due to synchronization redo.
The fixing idea is pretty straightforward, the trie nodes on the path from root
to account should all be explicitly removed if an incomplete storage trie
occurs. Therefore, a `delete` operation has been added into `gentrie` to
explicitly clear the account along with all nodes on this path. The special
thing is that it's a cross-trie clearing. In theory, there may be a dangling
node at any position on this account key and we have to clear all of them.
This pull request defines a gentrie for snap sync purpose.
The stackTrie is used to generate the merkle tree nodes upon receiving a state batch. Several additional options have been added into stackTrie to handle incomplete states (either missing states before or after).
In this pull request, these options have been relocated from stackTrie to genTrie, which serves as a wrapper for stackTrie specifically for snap sync purposes.
Further, the logic for managing incomplete state has been enhanced in this change. Originally, there are two cases handled:
- boundary node filtering
- internal (covered by extension node) node clearing
This changes adds one more:
- Clearing leftover nodes on the boundaries.
This feature is necessary if there are leftover trie nodes in database, otherwise node inconsistency may break the state healing.
This PR removes panics from stacktrie (mostly), and makes the Update return errors instead. While adding tests for this, I also found that one case of possible corruption was not caught, which is now fixed.
This change enhances the stacktrie constructor by introducing an option struct. It also simplifies the `Hash` and `Commit` operations, getting rid of the special handling round root node.
This change
- Removes the owner-notion from a stacktrie; the owner is only ever needed for comitting to the database, but the commit-function, the `writeFn` is provided by the caller, so the caller can just set the owner into the `writeFn` instead of having it passed through the stacktrie.
- Removes the `encoding.BinaryMarshaler`/`encoding.BinaryUnmarshaler` interface from stacktrie. We're not using it, and it is doubtful whether anyone downstream is either.
This change refactors stacktrie to separate the stacktrie itself from the
internal representation of nodes: a stacktrie is not a recursive structure
of stacktries, rather, a framework for representing and operating upon a set of nodes.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
In this PR, all TryXXX(e.g. TryGet) APIs of trie are renamed to XXX(e.g. Get) with an error returned.
The original XXX(e.g. Get) APIs are renamed to MustXXX(e.g. MustGet) and does not return any error -- they print a log output. A future PR will change the behaviour to panic on errorrs.
The EmptyRootHash and EmptyCodeHash are defined everywhere in the codebase, this PR replaces all of them with unified one defined in core/types package, and also defines constants for TxRoot, WithdrawalsRoot and UncleRoot
This PR introduces a node scheme abstraction. The interface is only implemented by `hashScheme` at the moment, but will be extended by `pathScheme` very soon.
Apart from that, a few changes are also included which is worth mentioning:
- port the changes in the stacktrie, tracking the path prefix of nodes during commit
- use ethdb.Database for constructing trie.Database. This is not necessary right now, but it is required for path-based used to open reverse diff freezer
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.
In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.
With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
This change speeds up trie hashing and all other activities that require
RLP encoding of trie nodes by approximately 20%. The speedup is achieved by
avoiding reflection overhead during node encoding.
The interface type trie.node now contains a method 'encode' that works with
rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code.
trie.hasher, which is pooled to avoid allocations, now maintains an
EncoderBuffer. This means memory resources related to trie node encoding
are tied to the hasher pool.
Co-authored-by: Felix Lange <fjl@twurst.com>
Trim the search key from head as it's being pushed deeper into the trie. Previously the search key was never modified but each node kept information how to slice and compare it in keyOffset. Now the keyOffset is not needed as this information is included in the slice of the search key. This way the keyOffset can be removed and key manipulation
simplified.
The stacktrie is a bit un-untuitive, API-wise: since it mutates input values.
Such behaviour is dangerous, and easy to get wrong if the calling code 'forgets' this quirk. The behaviour is fixed by this PR, so that the input values are not modified by the stacktrie.
Note: just as with the Trie, the stacktrie still references the live input objects, so it's still _not_ safe to mutate the values form the callsite.
* trie: fix error in stacktrie not committing small roots
* fuzzers: make trie-fuzzer use correct returnvalues
* trie: improved tests
* tests/fuzzers: fuzzer for stacktrie vs regular trie
* test/fuzzers: make stacktrie fuzzer use 32-byte keys
* trie: fix error in stacktrie with small nodes
* trie: add (skipped) testcase for stacktrie
* tests/fuzzers: address review comments for stacktrie fuzzer
* trie: fix docs in stacktrie
core/types: use stacktrie for derivesha
trie: add stacktrie file
trie: fix linter
core/types: use stacktrie for derivesha
rebased: adapt stacktrie to the newer version of DeriveSha
Co-authored-by: Martin Holst Swende <martin@swende.se>
More linter fixes
review feedback: no key offset for nodes converted to hashes
trie: use EncodeRLP for full nodes
core/types: insert txs in order in derivesha
trie: tests for derivesha with stacktrie
trie: make stacktrie use pooled hashers
trie: make stacktrie reuse tmp slice space
trie: minor polishes on stacktrie
trie/stacktrie: less rlp dancing
core/types: explain the contorsions in DeriveSha
ci: fix goimport errors
trie: clear mem on subtrie hashing
squashme: linter fix
stracktrie: use pooling, less allocs (#3)
trie: in-place hex prefix, reduce allocs and add rawNode.EncodeRLP
Reintroduce the `[]node` method, add the missing `EncodeRLP` implementation for `rawNode` and calculate the hex prefix in place.
Co-authored-by: Martin Holst Swende <martin@swende.se>
Co-authored-by: Martin Holst Swende <martin@swende.se>