Official Go implementation of the Ethereum protocol
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
go-ethereum/docs/_evm-tracing/index.md

152 lines
8.2 KiB

---
title: Introduction to tracing
sort-key: A
---
Tracing allows users to examine precisely what was executed by the EVM during some
specific transaction or set of transactions. There are two different types of
[transactions](https://ethereum.org/en/developers/docs/transactions) in Ethereum:
value transfers and contract executions. A value transfer just moves ETH from one
account to another. A contract interaction executes some code stored at a contract
address which can include altering stored data and transacting multiple times with
other contracts and externally-owned accounts. A contract execution transaction can
therefore be a complicated web of interactions that can be difficult to unpick. The
transaction receipt contains a status code that shows whether the transaction succeeded
or failed, but more detailed information is not readily available, meaning it is very
difficult to know what a contract execution actually did, what data was modified and
which addresses were touched. This is the problem that EVM tracing solves. Geth traces
transactions by re-running them locally and collecting data about precisely what was
executed by the EVM.
Also see this [Devcon 2022 talk](https://www.youtube.com/watch?v=b8RdmGsilfU) on
tracing in Geth.
## State availability
In its simplest form, tracing a transaction entails requesting the Ethereum node
to reexecute the desired transaction with varying degrees of data collection and
have it return an aggregated summary. In order for a Geth node to reexecute a
transaction, all historical state accessed by the transaction must be available.
This includes:
- Balance, nonce, bytecode and storage of both the recipient as well as all
internally invoked contracts.
- Block metadata referenced during execution of both the outer as well as all
internally created transactions.
- Intermediate state generated by all preceding transactions contained in the
same block as the one being traced.
This means there are limits on the transactions that can be traced imposed by the
synchronization and pruning configuration of a node:
- An **archive** node retains **all historical data** back to genesis. It can therefore
trace arbitrary transactions at any point in the history of the chain. Tracing a single
transaction requires reexecuting all preceding transactions in the same block.
- A **node synced from genesis** node only retains the most recent 128 block states in
memory. Older states are represented by a sequence of occasional checkpoints that
intermediate states can be regenerated from. This means that states within the most recent
128 blocks are immediately available, older states have to be regenerated from snapshots
"on-the-fly". If the distance between the requested transaction and the most recent checkpoint
is large, rebuilding the state can take a long time. Tracing a single transaction requires
reexecuting all preceding transactions in the same block **and** all preceding blocks until
the previous stored snapshot.
- A **snap synced** node holds the most recent 128 blocks in memory, so transactions in that
range are always accessible. However, snap-sync only starts processing from a relatively
recent block (as opposed to genesis for a full node). Between the initial sync block and
the 128 most recent blocks, the node stores occasional checkpoints that can be used to
rebuild the state on-the-fly. This means transactions can be traced back as far as the
block that was used for the initial sync. Tracing a single transaction requires reexecuting
all preceding transactions in the same block, **and** all preceding blocks until the previous
stored snapshot.
- A **light synced** node retrieving data **on demand** can in theory trace transactions
for which all required historical state is readily available in the network. This is
because the data required to generate the trace is requested from an les-serving full
node. In practice, data availability **cannot** be reasonably assumed.
![state pruning options](/static/images/state-pruning.png)
*This image shows the state stored by each sync-mode - red indicates stored state. The full width of each line represents origin to present head*
More detailed information about syncing is available on the [sync modes page](/docs/interface/sync-modes).
When a trace of a specific transaction is executed, the state is prepared by fetching the
state of the parent block from the database. If it is not available, Geth will crawl backwards
in time to find the next available state but only up to a limit defined in the `reexec`
parameter which defaults to 128 blocks. If no state is available within the `reexec`
window then the trace fails with `Error: required historical state unavailable` and
the `reexec` parameter must be increased. If a valid state *is* found in the `reexec`
window, then Geth sequentially re-executes the transcations in each block between the
last available state and the target block. The greater the value of `reexec` the longer
the tracing will take because more blocks have to be re-executed to regenerate the target
state.
The `debug_getAccessibleStates` endpoint is a useful tool for estimating a suitable
value for `reexec`. Passing the number of the block that contains the target transaction
and a search distance to this endpoint will return the number of blocks behind the current
head where the most recent available state exists. This value can be passed to the tracer
as `re-exec`.
It is also possible to force Geth to store the state for specific sequences of block by
stopping Geth, running again with `--gcmode archive` for some period - this prevents state
pruning for blocks that arrive while Geth is running with `--gcmode archive`.
_There are exceptions to the above rules when running batch traces of entire blocks or chain segments. Those will be detailed later._
## Types of trace
### Basic traces
The simplest type of transaction trace that Geth can generate are raw EVM opcode
traces. For every EVM instruction the transaction executes, a structured log entry is
emitted, containing all contextual metadata deemed useful. This includes the *program
counter*, *opcode name*, *opcode cost*, *remaining gas*, *execution depth* and any
*occurred error*. The structured logs can optionally also contain the content of the
*execution stack*, *execution memory* and *contract storage*.
Read more about Geth's basic traces on the [basic traces page](/docs/evm-tracing/basic-traces).
### Built-in tracers
The tracing API accepts an optional `tracer` parameter that defines how the data
returned to the API call should be processed. If this parameter is ommitted the
default tracer is used. The default is the struct (or 'opcode') logger. These raw
opcode traces are sometimes useful, but the returned data is very low level and
can be too extensive and awkward to read for many use-cases. A full opcode trace
can easily go into the hundreds of megabytes, making them very resource intensive
to get out of the node and process externally. For these reasons, there are a set
of non-default built-in tracers that can be named in the API call to return
different data from the method. Under the hood, these tracers are Go or Javascript
functions that do some specific preprocessing on the trace data before it is returned.
More information about Geth's built-in tracers is available on the
[built-in tracers](/docs/evm-tracing/builtin-tracers)
page.
### Custom tracers
In addition to built-in tracers, it is possible to provide custom code that hooks
to events in the EVM to process and return data in a consumable format. Custom
tracers can be written either in Javascript or Go. JS tracers are good for quick
prototyping and experimentation as well as for less intensive applications. Go
tracers are performant but require the tracer to be compiled together with the
Geth source code. This means developers only have to gather the data they actually
need, and do any processing at the source.
More information about custom tracers is available on the
[custom tracers](/docs/evm-tracing/custom-tracer)
page.
## Summary
This page gave an introduction to the concept of tracing and explained issues around
state availability. More detailed information on Geth's built-in and custom tracers
can be found on their dedicated pages.