mirror of https://github.com/ethereum/go-ethereum
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
152 lines
8.2 KiB
152 lines
8.2 KiB
---
|
|
title: Introduction to tracing
|
|
sort-key: A
|
|
---
|
|
|
|
Tracing allows users to examine precisely what was executed by the EVM during some
|
|
specific transaction or set of transactions. There are two different types of
|
|
[transactions](https://ethereum.org/en/developers/docs/transactions) in Ethereum:
|
|
value transfers and contract executions. A value transfer just moves ETH from one
|
|
account to another. A contract interaction executes some code stored at a contract
|
|
address which can include altering stored data and transacting multiple times with
|
|
other contracts and externally-owned accounts. A contract execution transaction can
|
|
therefore be a complicated web of interactions that can be difficult to unpick. The
|
|
transaction receipt contains a status code that shows whether the transaction succeeded
|
|
or failed, but more detailed information is not readily available, meaning it is very
|
|
difficult to know what a contract execution actually did, what data was modified and
|
|
which addresses were touched. This is the problem that EVM tracing solves. Geth traces
|
|
transactions by re-running them locally and collecting data about precisely what was
|
|
executed by the EVM.
|
|
|
|
Also see this [Devcon 2022 talk](https://www.youtube.com/watch?v=b8RdmGsilfU) on
|
|
tracing in Geth.
|
|
|
|
## State availability
|
|
|
|
In its simplest form, tracing a transaction entails requesting the Ethereum node
|
|
to reexecute the desired transaction with varying degrees of data collection and
|
|
have it return an aggregated summary. In order for a Geth node to reexecute a
|
|
transaction, all historical state accessed by the transaction must be available.
|
|
This includes:
|
|
|
|
- Balance, nonce, bytecode and storage of both the recipient as well as all
|
|
internally invoked contracts.
|
|
|
|
- Block metadata referenced during execution of both the outer as well as all
|
|
internally created transactions.
|
|
|
|
- Intermediate state generated by all preceding transactions contained in the
|
|
same block as the one being traced.
|
|
|
|
This means there are limits on the transactions that can be traced imposed by the
|
|
synchronization and pruning configuration of a node:
|
|
|
|
- An **archive** node retains **all historical data** back to genesis. It can therefore
|
|
trace arbitrary transactions at any point in the history of the chain. Tracing a single
|
|
transaction requires reexecuting all preceding transactions in the same block.
|
|
|
|
- A **node synced from genesis** node only retains the most recent 128 block states in
|
|
memory. Older states are represented by a sequence of occasional checkpoints that
|
|
intermediate states can be regenerated from. This means that states within the most recent
|
|
128 blocks are immediately available, older states have to be regenerated from snapshots
|
|
"on-the-fly". If the distance between the requested transaction and the most recent checkpoint
|
|
is large, rebuilding the state can take a long time. Tracing a single transaction requires
|
|
reexecuting all preceding transactions in the same block **and** all preceding blocks until
|
|
the previous stored snapshot.
|
|
|
|
- A **snap synced** node holds the most recent 128 blocks in memory, so transactions in that
|
|
range are always accessible. However, snap-sync only starts processing from a relatively
|
|
recent block (as opposed to genesis for a full node). Between the initial sync block and
|
|
the 128 most recent blocks, the node stores occasional checkpoints that can be used to
|
|
rebuild the state on-the-fly. This means transactions can be traced back as far as the
|
|
block that was used for the initial sync. Tracing a single transaction requires reexecuting
|
|
all preceding transactions in the same block, **and** all preceding blocks until the previous
|
|
stored snapshot.
|
|
|
|
- A **light synced** node retrieving data **on demand** can in theory trace transactions
|
|
for which all required historical state is readily available in the network. This is
|
|
because the data required to generate the trace is requested from an les-serving full
|
|
node. In practice, data availability **cannot** be reasonably assumed.
|
|
|
|
![state pruning options](/static/images/state-pruning.png)
|
|
|
|
*This image shows the state stored by each sync-mode - red indicates stored state. The full width of each line represents origin to present head*
|
|
|
|
|
|
More detailed information about syncing is available on the [sync modes page](/docs/interface/sync-modes).
|
|
|
|
When a trace of a specific transaction is executed, the state is prepared by fetching the
|
|
state of the parent block from the database. If it is not available, Geth will crawl backwards
|
|
in time to find the next available state but only up to a limit defined in the `reexec`
|
|
parameter which defaults to 128 blocks. If no state is available within the `reexec`
|
|
window then the trace fails with `Error: required historical state unavailable` and
|
|
the `reexec` parameter must be increased. If a valid state *is* found in the `reexec`
|
|
window, then Geth sequentially re-executes the transcations in each block between the
|
|
last available state and the target block. The greater the value of `reexec` the longer
|
|
the tracing will take because more blocks have to be re-executed to regenerate the target
|
|
state.
|
|
|
|
The `debug_getAccessibleStates` endpoint is a useful tool for estimating a suitable
|
|
value for `reexec`. Passing the number of the block that contains the target transaction
|
|
and a search distance to this endpoint will return the number of blocks behind the current
|
|
head where the most recent available state exists. This value can be passed to the tracer
|
|
as `re-exec`.
|
|
|
|
It is also possible to force Geth to store the state for specific sequences of block by
|
|
stopping Geth, running again with `--gcmode archive` for some period - this prevents state
|
|
pruning for blocks that arrive while Geth is running with `--gcmode archive`.
|
|
|
|
_There are exceptions to the above rules when running batch traces of entire blocks or chain segments. Those will be detailed later._
|
|
|
|
## Types of trace
|
|
|
|
### Basic traces
|
|
|
|
The simplest type of transaction trace that Geth can generate are raw EVM opcode
|
|
traces. For every EVM instruction the transaction executes, a structured log entry is
|
|
emitted, containing all contextual metadata deemed useful. This includes the *program
|
|
counter*, *opcode name*, *opcode cost*, *remaining gas*, *execution depth* and any
|
|
*occurred error*. The structured logs can optionally also contain the content of the
|
|
*execution stack*, *execution memory* and *contract storage*.
|
|
|
|
Read more about Geth's basic traces on the [basic traces page](/docs/evm-tracing/basic-traces).
|
|
|
|
|
|
### Built-in tracers
|
|
|
|
The tracing API accepts an optional `tracer` parameter that defines how the data
|
|
returned to the API call should be processed. If this parameter is ommitted the
|
|
default tracer is used. The default is the struct (or 'opcode') logger. These raw
|
|
opcode traces are sometimes useful, but the returned data is very low level and
|
|
can be too extensive and awkward to read for many use-cases. A full opcode trace
|
|
can easily go into the hundreds of megabytes, making them very resource intensive
|
|
to get out of the node and process externally. For these reasons, there are a set
|
|
of non-default built-in tracers that can be named in the API call to return
|
|
different data from the method. Under the hood, these tracers are Go or Javascript
|
|
functions that do some specific preprocessing on the trace data before it is returned.
|
|
|
|
More information about Geth's built-in tracers is available on the
|
|
[built-in tracers](/docs/evm-tracing/builtin-tracers)
|
|
page.
|
|
|
|
|
|
### Custom tracers
|
|
|
|
In addition to built-in tracers, it is possible to provide custom code that hooks
|
|
to events in the EVM to process and return data in a consumable format. Custom
|
|
tracers can be written either in Javascript or Go. JS tracers are good for quick
|
|
prototyping and experimentation as well as for less intensive applications. Go
|
|
tracers are performant but require the tracer to be compiled together with the
|
|
Geth source code. This means developers only have to gather the data they actually
|
|
need, and do any processing at the source.
|
|
|
|
More information about custom tracers is available on the
|
|
[custom tracers](/docs/evm-tracing/custom-tracer)
|
|
page.
|
|
|
|
|
|
## Summary
|
|
|
|
This page gave an introduction to the concept of tracing and explained issues around
|
|
state availability. More detailed information on Geth's built-in and custom tracers
|
|
can be found on their dedicated pages. |