Original problem was caused by #28595, where we made it so that as soon as we start to sync, the root of the disk layer is deleted. That is not wrong per se, but another part of the code uses the "presence of the root" as an init-check for the pathdb. And, since the init-check now failed, the code tried to re-initialize it which failed since a sync was already ongoing.
The total impact being: after a state-sync has begun, if the node for some reason is is shut down, it will refuse to start up again, with the error message: `Fatal: Failed to register the Ethereum service: waiting for sync.`.
This change also modifies how `geth removedb` works, so that the user is prompted for two things: `state data` and `ancient chain`. The former includes both the chaindb aswell as any state history stored in ancients.
---------
Co-authored-by: Martin HS <martin@swende.se>
This fixes a database corruption issue that could occur during state healing.
When sync is aborted while certain modifications were already committed, and a
reorg occurs, the database would contain incorrect trie nodes stored by path.
These nodes need to detected/deleted in order to obtain a complete and fully correct state
after state healing.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Add read locking of db lock around access to dirties cache in hashdb.Database to prevent
data race versus hashdb.Database.dereference which can modify the dirities map by deleting
an item.
Fixes#28541
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
* rpc: make subscription test faster
reduces time for TestClientSubscriptionChannelClose
from 25 sec to < 1 sec.
* trie: cache trie nodes for faster sanity check
This reduces the time spent on TestIncompleteSyncHash
from ~25s to ~16s.
* core/forkid: speed up validation test
This takes the validation test from > 5s to sub 1 sec
* core/state: improve snapshot test run
brings the time for TestSnapshotRandom from 13s down to 6s
* accounts/keystore: improve keyfile test
This removes some unnecessary waits and reduces the
runtime of TestUpdatedKeyfileContents from 5 to 3 seconds
* trie: remove resolver
* trie: only check ~5% of all trie nodes
* trie: use pooling of iterator states in iterator
The node iterator burns through a lot of memory while iterating a trie, and a lot of
that can be avoided by using a fairly small pool (max 40 items).
name old time/op new time/op delta
Iterator-8 6.22ms ± 3% 5.40ms ± 6% -13.18% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Iterator-8 2.36MB ± 0% 1.67MB ± 0% -29.23% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Iterator-8 37.0k ± 0% 29.8k ± 0% ~ (p=0.079 n=4+5)
* ethdb/memorydb: avoid one copying of key
By making the transformation from []byte to string at an earlier point,
we save an allocation which otherwise happens later on.
name old time/op new time/op delta
BatchAllocs-8 412µs ± 6% 382µs ± 2% -7.18% (p=0.016 n=5+4)
name old alloc/op new alloc/op delta
BatchAllocs-8 480kB ± 0% 490kB ± 0% +1.93% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
BatchAllocs-8 3.03k ± 0% 2.03k ± 0% -32.98% (p=0.008 n=5+5)
This PR moves our fuzzers from tests/fuzzers into whatever their respective 'native' package is.
The historical reason why they were placed in an external location, is that when they were based on go-fuzz, they could not be "hidden" via the _test.go prefix. So in order to shove them away from the go-ethereum "production code", they were put aside.
But now we've rewritten them to be based on golang testing, and thus can be brought back. I've left (in tests/) the ones that are not production (bls128381), require non-standard imports (secp requires btcec, bn256 requires gnark/google/cloudflare deps).
This PR also adds a fuzzer for precompiled contracts, because why not.
This PR utilizes a newly rewritten replacement for go-118-fuzz-build, namely gofuzz-shim, which utilises the inputs from the fuzzing engine better.
This change allows the creation of a genesis block for verkle testnets. This makes for a chunk of code that is easier to review and still touches many discussion points.
This PR removes panics from stacktrie (mostly), and makes the Update return errors instead. While adding tests for this, I also found that one case of possible corruption was not caught, which is now fixed.
This change enhances the stacktrie constructor by introducing an option struct. It also simplifies the `Hash` and `Commit` operations, getting rid of the special handling round root node.
During snap-sync, we request ranges of values: either a range of accounts or a range of storage values. For any large trie, e.g. the main account trie or a large storage trie, we cannot fetch everything at once.
Short version; we split it up and request in multiple stages. To do so, we use an origin field, to say "Give me all storage key/values where key > 0x20000000000000000". When the server fulfils this, the server provides the first key after origin, let's say 0x2e030000000000000 -- never providing the exact origin. However, the client-side needs to be able to verify that the 0x2e03.. indeed is the first one after 0x2000.., and therefore the attached proof concerns the origin, not the first key.
So, short-short version: the left-hand side of the proof relates to the origin, and is free-standing from the first leaf.
On the other hand, (pun intended), the right-hand side, there's no such 'gap' between "along what path does the proof walk" and the last provided leaf. The proof must prove the last element (unless there are no elements).
Therefore, we can simplify the semantics for trie.VerifyRangeProof by removing an argument. This doesn't make much difference in practice, but makes it so that we can remove some tests. The reason I am raising this is that the upcoming stacktrie-based verifier does not support such fancy features as standalone right-hand borders.
This change addresses an issue in snap sync, specifically when the entire sync process can be halted due to an encountered empty storage range.
Currently, on the snap sync client side, the response to an empty (partial) storage range is discarded as a non-delivery. However, this response can be a valid response, when the particular range requested does not contain any slots.
For instance, consider a large contract where the entire key space is divided into 16 chunks, and there are no available slots in the last chunk [0xf] -> [end]. When the node receives a request for this particular range, the response includes:
The proof with origin [0xf]
A nil storage slot set
If we simply discard this response, the finalization of the last range will be skipped, halting the entire sync process indefinitely. The test case TestSyncWithUnevenStorage can reproduce the scenario described above.
In addition, this change also defines the common variables MaxAddress and MaxHash.
This change
- Removes the owner-notion from a stacktrie; the owner is only ever needed for comitting to the database, but the commit-function, the `writeFn` is provided by the caller, so the caller can just set the owner into the `writeFn` instead of having it passed through the stacktrie.
- Removes the `encoding.BinaryMarshaler`/`encoding.BinaryUnmarshaler` interface from stacktrie. We're not using it, and it is doubtful whether anyone downstream is either.
This is a minor refactor in preparation of changes to range verifier. This PR contains no intentional functional changes but moves (and renames) the light.NodeSet
This change refactors stacktrie to separate the stacktrie itself from the
internal representation of nodes: a stacktrie is not a recursive structure
of stacktries, rather, a framework for representing and operating upon a set of nodes.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This change fixes the bug in a benchmark, where the input to the trie is reused in a way which is not correct.
---------
Co-authored-by: Martin Holst Swende <martin@swende.se>
The Go authors updated golang/x/ext to change the function signature of the slices sort method.
It's an entire shitshow now because x/ext is not tagged, so everyone's codebase just
picked a new version that some other dep depends on, causing our code to fail building.
This PR updates the dep on our code too and does all the refactorings to follow upstream...
* all: implement path-based state scheme
* all: edits from review
* core/rawdb, trie/triedb/pathdb: review changes
* core, light, trie, eth, tests: reimplement pbss history
* core, trie/triedb/pathdb: track block number in state history
* trie/triedb/pathdb: add history documentation
* core, trie/triedb/pathdb: address comments from Peter's review
Important changes to list:
- Cache trie nodes by path in clean cache
- Remove root->id mappings when history is truncated
* trie/triedb/pathdb: fallback to disk if unexpect node in clean cache
* core/rawdb: fix tests
* trie/triedb/pathdb: rename metrics, change clean cache key
* trie/triedb: manage the clean cache inside of disk layer
* trie/triedb/pathdb: move journal function
* trie/triedb/path: fix tests
* trie/triedb/pathdb: fix journal
* trie/triedb/pathdb: fix history
* trie/triedb/pathdb: try to fix tests on windows
* core, trie: address comments
* trie/triedb/pathdb: fix test issues
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
This change makes the StateDB track the state key value diff of a block transition.
We already tracked current account and storage values for the purpose of updating
the state snapshot. With this PR, we now also track the original (pre-transition) values
of accounts and storage slots.
The clean trie cache is persisted periodically, therefore Geth can
quickly warmup the cache in next restart.
However it will reduce the robustness of system. The assumption is
held in Geth that if the parent trie node is present, then the entire
sub-trie associated with the parent are all prensent.
Imagine the scenario that Geth rewinds itself to a past block and
restart, but Geth finds the root node of "future state" in clean
cache then regard this state is present in disk, while is not in fact.
Another example is offline pruning tool. Whenever an offline pruning
is performed, the clean cache file has to be removed to aviod hitting
the root node of "deleted states" in clean cache.
All in all, compare with the minor performance gain, system robustness
is something we care more.
* core/state, light, les: make signature of ContractCode hash-independent
* push current state for feedback
* les: fix unit test
* core, les, light: fix les unittests
* core/state, trie, les, light: fix state iterator
* core, les: address comments
* les: fix lint
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Verkle trees store the code inside the trie. This PR changes the interface to pass the code, as well as the dirty flag to tell the trie package if the code is dirty and needs to be updated. This is a no-op for the MPT and the odr trie.
The state availability is checked during the creation of a state reader.
- In hash-based database, if the specified root node does not exist on disk disk, then
the state reader won't be created and an error will be returned.
- In path-based database, if the specified state layer is not available, then the
state reader won't be created and an error will be returned.
This change also contains a stricter semantics regarding the `Commit` operation: once it has been performed, the trie is no longer usable, and certain operations will return an error.
This removes the feature where top nodes of the proof can be elided.
It was intended to be used by the LES server, to save bandwidth
when the client had already fetched parts of the state and only needed
some extra nodes to complete the proof. Alas, it never got implemented
in the client.
Continuing with a series of PRs to make the Trie interface more generic, this PR moves
the RLP encoding of storage slots inside the StateTrie and light.Trie implementations,
as other types of tries don't use RLP.
* trie: add node type common package
In trie/types package, a few node wrappers are defined, which will be used
in both trie package, trie/snap package, etc. Therefore, a standalone common
package is created to put these stuffs.
* trie: rename trie/types to trie/trienode
In this PR, all TryXXX(e.g. TryGet) APIs of trie are renamed to XXX(e.g. Get) with an error returned.
The original XXX(e.g. Get) APIs are renamed to MustXXX(e.g. MustGet) and does not return any error -- they print a log output. A future PR will change the behaviour to panic on errorrs.