From dcacefbd46a8a188f61983c1316f3f0b49da6f72 Mon Sep 17 00:00:00 2001 From: Joe Date: Thu, 3 Nov 2022 16:11:11 +0000 Subject: [PATCH] initial commit for new tracing pages --- .../{ => evm-tracing}/built-in-tracers.md | 49 +++++- .../{ => evm-tracing}/custom-tracer.md | 0 .../dapp-developer/evm-tracing/index.md | 57 +++++++ .../docs/developers/dapp-developer/tracing.md | 155 ------------------ 4 files changed, 103 insertions(+), 158 deletions(-) rename src/pages/docs/developers/dapp-developer/{ => evm-tracing}/built-in-tracers.md (73%) rename src/pages/docs/developers/dapp-developer/{ => evm-tracing}/custom-tracer.md (100%) create mode 100644 src/pages/docs/developers/dapp-developer/evm-tracing/index.md delete mode 100644 src/pages/docs/developers/dapp-developer/tracing.md diff --git a/src/pages/docs/developers/dapp-developer/built-in-tracers.md b/src/pages/docs/developers/dapp-developer/evm-tracing/built-in-tracers.md similarity index 73% rename from src/pages/docs/developers/dapp-developer/built-in-tracers.md rename to src/pages/docs/developers/dapp-developer/evm-tracing/built-in-tracers.md index 2c93414daf..2cf967c64e 100644 --- a/src/pages/docs/developers/dapp-developer/built-in-tracers.md +++ b/src/pages/docs/developers/dapp-developer/evm-tracing/built-in-tracers.md @@ -3,11 +3,11 @@ title: Built-in tracers description: Explanation of the tracers that come bundled in Geth as part of the tracing API. --- -Geth comes bundled with a choice of tracers ready for usage through the [tracing API](/docs/rpc/ns-debug). Some of them are implemented natively in Go, and others in JS. In this page a summary of each of these will be outlined. They have to be specified by name when sending a request. The only exception is the opcode logger (otherwise known as struct logger) which is the default tracer for all the methods and cannot be specified by name. +Geth comes bundled with a choice of tracers that can be invoked via the [tracing API](/docs/rpc/ns-debug). Some of these built-in tracers are implemented natively in Go, and others in Javascript. The default tracer is the opcode logger (otherwise known as struct logger) which is the default tracer for all the methods. Other tracers have to be specified by name when sending a request. -## Struct logger +## Struct/opcode logger -Struct logger or opcode logger is a native Go tracer which executes a transaction and emits the opcode and execution context at every step. This is the tracer that will be used when no name is passed to the API, e.g. `debug.traceTransaction()`. The following information is emitted at each step: +The struct logger (aka opcode logger) is a native Go tracer which executes a transaction and emits the opcode and execution context at every step. This is the tracer that will be used when no name is passed to the API, e.g. `debug.traceTransaction()`. The following information is emitted at each step: | field | type | description | | ---------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------- | @@ -107,3 +107,46 @@ The following are a list of tracers written in JS that come as part of Geth: - `opcountTracer` Counts the total number of opcodes executed - `trigramTracer`: Counts the opcode trigrams - `unigramTracer`: Counts the occurances of each opcode + + + + + + +############################# + +To follow along with this tutorial, transaction hashes can be found from a local Geth node (e.g. by attaching a [Javascript console](/docs/interface/javascript-console) and running `eth.getBlock('latest')` then passing a transaction hash from the returned block to `debug.traceTransaction()`) or from a block explorer (for [Mainnet](https://etherscan.io/) or a [testnet](https://goerli.etherscan.io/)). + +It is also possible to configure the trace by passing Boolean (true/false) values for four parameters that tweak the verbosity of the trace. By default, the _EVM memory_ and _Return data_ are not reported but the _EVM stack_ and _EVM storage_ are. To report the maximum amount of data: + +```shell +enableMemory: true +disableStack: false +disableStorage: false +enableReturnData: true +``` + +An example call, made in the Geth Javascript console, configured to report the maximum amount of data looks as follows: + +```js +debug.traceTransaction('0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f', { + enableMemory: true, + disableStack: false, + disableStorage: false, + enableReturnData: true +}); +``` + +Running the above operation on the Rinkeby network (with a node retaining enough history) will result in this [trace dump](https://gist.github.com/karalabe/c91f95ac57f5e57f8b950ec65ecc697f). + +Alternatively, disabling _EVM Stack_, _EVM Memory_, _Storage_ and _Return data_ (as demonstrated in the Curl request below) results in the following, much shorter, [trace dump](https://gist.github.com/karalabe/d74a7cb33a70f2af75e7824fc772c5b4). + +``` +$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableStorage": true}]}' localhost:8545 +``` + + + +###################################################### + + diff --git a/src/pages/docs/developers/dapp-developer/custom-tracer.md b/src/pages/docs/developers/dapp-developer/evm-tracing/custom-tracer.md similarity index 100% rename from src/pages/docs/developers/dapp-developer/custom-tracer.md rename to src/pages/docs/developers/dapp-developer/evm-tracing/custom-tracer.md diff --git a/src/pages/docs/developers/dapp-developer/evm-tracing/index.md b/src/pages/docs/developers/dapp-developer/evm-tracing/index.md new file mode 100644 index 0000000000..909c5fe84c --- /dev/null +++ b/src/pages/docs/developers/dapp-developer/evm-tracing/index.md @@ -0,0 +1,57 @@ +--- +title: EVM Tracing +description: Introduction to tracing EVM transactions using Geth +--- + +Tracing allows users to examine precisely what was executed by the EVM during some specific transaction or set of transactions. There are two different types of [transactions](https://ethereum.org/en/developers/docs/transactions) in Ethereum: value transfers and contract executions. A value transfer just moves ETH from one account to another. A contract interaction executes some code stored at a contract address which can include altering stored data and transacting multiple times with other contracts and externally-owned accounts. A contract execution transaction can therefore be a complicated web of interactions that can be difficult to unpick. The transaction receipt contains a status code that shows whether the transaction succeeded or failed, but more detailed information is not readily available, meaning it is very difficult to know what a contract execution actually did, what data was modified and which addresses were touched. This is the problem that EVM tracing solves. + +Geth traces transactions by re-running them locally and collecting data about precisely what was executed by the EVM. + +## State availability + +In its simplest form, tracing a transaction entails requesting the Ethereum node to reexecute the desired transaction with varying degrees of data collection and have it return an aggregated summary. In order for a Geth node to reexecute a transaction, all historical state accessed by the transaction must be available. This includes: + +- Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts. +- Block metadata referenced during execution of both the outer as well as all internally created transactions. +- Intermediate state generated by all preceding transactions contained in the same block as the one being traced. + +This means there are limits on the transactions that can be traced imposed by the synchronization and pruning configuration of a node: + +- An **archive** node retains **all historical data** back to genesis. It can therefore trace arbitrary transactions at any point in the history of the chain. Tracing a single transaction requires reexecuting all preceding transactions in the same block. + +- A **node synced from genesis** node only retains the most recent 128 block states in memory. Older states are represented by a sequence of occasional checkpoints that intermediate states can be regenerated from. This means that states within the msot recent 128 blocks are immediately available, older states have to be regenerated from snapshots "on-the-fly". If the distance between the requested transaction and the most recent checkpoint is large, rebuilding the state can take a long time. Tracing a single transaction requires reexecuting all preceding transactions in the same block **and** all preceding blocks until the previous stored snapshot. + +- A **snap synced** node holds the most recent 128 blocks in memory, so transactions in that range are always accessible. However, snap-sync only starts processing from a relatively recent block (as opposed to genesis for a full node). Between the initial sync block and the 128 most recent blocks, the node stores occasional checkpoints that can be used to rebuild the state on-the-fly. This means transactions can be traced back as far as the block that was used for the initial sync. Tracing a single transaction requires reexecuting all preceding transactions in the same block, + **and** all preceding blocks until the previous stored snapshot. + +- A **light synced** node retrieving data **on demand** can in theory trace transactions for which all required historical state is readily available in the network. This is because the data required to generate the trace is requested from an les-serving full node. In practice, data + availability **cannot** be reasonably assumed. + +More detailed information about syncing is available on the [sync modes page](/pages/docs/fundamentals/sync-modes.md). + +When a trace of a specific transaction is executed, the state is prepared by fetching the state of the parent block from the database. If it is not available, Geth will crawl backwards in time to find the next available state but only up to a limit defined in the `reexec` parameter which defaults to 128 blocks. If no state is available within the `reexec` window then the trace fails with `Error: required historical state unavailable` and the `reexec` parameter must be increased. If a valid state *is* found in the `reexec` window, then Geth sequentially re-executes the transcations in each block between the last available state and the target block. The greater the value of `reexec` the longer the tracing will take because more blocks have to be re-executed to regenerate the target state. + +The `debug_getAccessibleStates` endpoint is a useful tool for estimating a suitable value for `reexec`. Passing the number of the block that contains the target transaction and a search distance to this endpoint will return the number of blocks behind the current head where the most recent available state exists. This value can be passed to the tracer as `re-exec`. + +It is also possible to force Geth to store the state for specific sequences of block by stopping Geth, running again with `--gcmode archive` for some period - this prevents state prunign for blocks that arrive while Geth is running with `--gcmode archive`. + +_There are exceptions to the above rules when running batch traces of entire blocks or chain segments. Those will be detailed later._ + +## Types of trace + +### Built-in tracers +The tracing API accepts an optional `tracer` parameter that defines how the data returned to the API call should be processed. If this parameter is ommitted the default tracer is used. The default is the struct (or 'opcode') logger. These raw opcode traces are sometimes useful, but the returned data is very low level and can be too extensive and awkward to read for many use-cases. A full opcode trace can easily go into the hundreds of megabytes, making them very resource intensive to get out of the node and process externally. For these reasons, there are a set of non-default built-in tracers that can be named in the API call to return different data from the method. Under the hood, these tracers are Go or Javascript functions that do some specific preprocessing on the trace data before it is returned. + +More information about Geth's built-in tracers is available on the [built-in tracers](/pages/docs/developers/dapp-developer/evm-tracing/built-in-tracers.md) page. + + +### Custom tracers + +In addition to built-in tracers, it is possible to provide custom code that hooks to events in the EVM to process and return data in a consumable format. Custom tracers can be written either in Javascript or Go. JS tracers are good for quick prototyping and experimentation as well as for less intensive applications. Go tracers are performant but require the tracer to be compiled together with the Geth source code. This means developers only have to gather the data they actually need, and do any processing at the source. + +More information about custom tracers is available on the [custom tracers](/pages/docs/developers/dapp-developer/evm-tracing/custom-tracers.md) page. + + +## Summary + +This page gave an introduction to the concept of tracing and explained issues around state availability. More detailed information on Geth's built-in and custom tracers can be found on their dedicated pages. \ No newline at end of file diff --git a/src/pages/docs/developers/dapp-developer/tracing.md b/src/pages/docs/developers/dapp-developer/tracing.md deleted file mode 100644 index 122ab94a97..0000000000 --- a/src/pages/docs/developers/dapp-developer/tracing.md +++ /dev/null @@ -1,155 +0,0 @@ ---- -title: EVM Tracing -description: Introduction to tracing EVM transactions using Geth ---- - -There are two different types of [transactions](https://ethereum.org/en/developers/docs/transactions) in Ethereum: simple value transfers and contract executions. A value transfer just moves Ether from one account to another. If however the recipient of a transaction is a contract account with associated [EVM](https://ethereum.org/en/developers/docs/evm) (Ethereum Virtual Machine) bytecode - beside transferring any Ether - the code will also be executed as part of the transaction. - -Having code associated with Ethereum accounts permits transactions to do arbitrarily complex data storage and enables them to act on the previously stored data by further transacting internally with outside accounts and contracts. This creates an interlinked ecosystem of contracts, where a single transaction can interact with tens or hundreds of accounts. - -The downside of contract execution is that it is very hard to say what a transaction actually did. A transaction receipt does contain a status code to check whether execution succeeded or not, but there is no way to see what data was modified, nor what external contracts where invoked. Geth resolves this by re-running transactions locally and collecting data about precisely what was executed by the EVM. This is known as "tracing" the transaction. - -## Tracing prerequisites - -In its simplest form, tracing a transaction entails requesting the Ethereum node to reexecute the desired transaction with varying degrees of data collection and have it return the aggregated summary for post processing. Reexecuting a transaction however has a few prerequisites to be met. - -In order for an Ethereum node to reexecute a transaction, all historical state accessed by the transaction must be available. This includes: - -- Balance, nonce, bytecode and storage of both the recipient as well as all internally invoked contracts. -- Block metadata referenced during execution of both the outer as well as all internally created transactions. -- Intermediate state generated by all preceding transactions contained in the same block as the one being traced. - -This means there are limits on the transactions that can be traced imposed by the synchronization and pruning configuration of a node. - -- An **archive** node retains **all historical data** back to genesis. It can therefore trace arbitrary transactions at any point in the history of the chain. Tracing a single transaction requires reexecuting all preceding transactions in the same block. - -- A **full synced** node retains the most recent 128 blocks in memory, so transactions in that range are always accessible. Full nodes also store occasional checkpoints back to genesis that can be used to rebuild the state at any point on-the-fly. This means older transactions can be traced but if there is a large distance between the requested transaction and the most recent checkpoint rebuilding the state can take a long time. Tracing a single transaction requires reexecuting all preceding transactions in the same block **and** all preceding blocks until the previous stored snapshot. - -- A **snap synced** node holds the most recent 128 blocks in memory, so transactions in that range are always accessible. However, snap-sync only starts processing from a relatively recent block (as opposed to genesis for a full node). Between the initial sync block and the 128 most recent blocks, the node stores occasional checkpoints that can be used to rebuild the state on-the-fly. This means transactions can be traced back as far as the block that was used for the initial sync. Tracing a single transaction requires reexecuting all preceding transactions in the same block, - **and** all preceding blocks until the previous stored snapshot. - -- A **light synced** node retrieving data **on demand** can in theory trace transactions for which all required historical state is readily available in the network. This is because the data required to generate the trace is requested from an les-serving full node. In practice, data - availability **cannot** be reasonably assumed. - -_There are exceptions to the above rules when running batch traces of entire blocks or chain segments. Those will be detailed later._ - -## Basic traces - -The simplest type of transaction trace that Geth can generate are raw EVM opcode traces. For every VM instruction the transaction executes, a structured log entry is emitted, containing all contextual metadata deemed useful. This includes the _program counter_, _opcode name_, _opcode cost_, _remaining gas_, _execution depth_ and any _occurred error_. The structured logs can optionally also contain the content of the _execution stack_, _execution memory_ and _contract storage_. - -The entire output of a raw EVM opcode trace is a JSON object having a few metadata fields: _consumed gas_, _failure status_, _return value_; and a list of _opcode entries_: - -```json -{ - "gas": 25523, - "failed": false, - "returnValue": "", - "structLogs": [] -} -``` - -An example log for a single opcode entry has the following format: - -```json -{ - "pc": 48, - "op": "DIV", - "gasCost": 5, - "gas": 64532, - "depth": 1, - "error": null, - "stack": [ - "00000000000000000000000000000000000000000000000000000000ffffffff", - "0000000100000000000000000000000000000000000000000000000000000000", - "2df07fbaabbe40e3244445af30759352e348ec8bebd4dd75467a9f29ec55d98d" - ], - "memory": [ - "0000000000000000000000000000000000000000000000000000000000000000", - "0000000000000000000000000000000000000000000000000000000000000000", - "0000000000000000000000000000000000000000000000000000000000000060" - ], - "storage": {} -} -``` - -### Generating basic traces - -To generate a raw EVM opcode trace, Geth provides a few [RPC API endpoints](/docs/rpc/ns-debug). The most commonly used is [`debug_traceTransaction`](/docs/rpc/ns-debug#debug_tracetransaction). - -In its simplest form, `traceTransaction` accepts a transaction hash as its only argument. It then traces the transaction, aggregates all the generated data and returns it as a **large** JSON object. A sample invocation from the Geth console would be: - -```js -debug.traceTransaction('0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f'); -``` - -The same call can also be invoked from outside the node too via HTTP RPC (e.g. using Curl). In this case, the HTTP endpoint must be enabled in Geth using the `--http` command and the `debug` API namespace must be exposed using `--http.api=debug`. - -``` -$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f"]}' localhost:8545 -``` - -To follow along with this tutorial, transaction hashes can be found from a local Geth node (e.g. by attaching a [Javascript console](/docs/interface/javascript-console) and running `eth.getBlock('latest')` then passing a transaction hash from the returned block to `debug.traceTransaction()`) or from a block explorer (for [Mainnet](https://etherscan.io/) or a [testnet](https://goerli.etherscan.io/)). - -It is also possible to configure the trace by passing Boolean (true/false) values for four parameters that tweak the verbosity of the trace. By default, the _EVM memory_ and _Return data_ are not reported but the _EVM stack_ and _EVM storage_ are. To report the maximum amount of data: - -```shell -enableMemory: true -disableStack: false -disableStorage: false -enableReturnData: true -``` - -An example call, made in the Geth Javascript console, configured to report the maximum amount of data looks as follows: - -```js -debug.traceTransaction('0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f', { - enableMemory: true, - disableStack: false, - disableStorage: false, - enableReturnData: true -}); -``` - -Running the above operation on the Rinkeby network (with a node retaining enough history) will result in this [trace dump](https://gist.github.com/karalabe/c91f95ac57f5e57f8b950ec65ecc697f). - -Alternatively, disabling _EVM Stack_, _EVM Memory_, _Storage_ and _Return data_ (as demonstrated in the Curl request below) results in the following, much shorter, [trace dump](https://gist.github.com/karalabe/d74a7cb33a70f2af75e7824fc772c5b4). - -``` -$ curl -H "Content-Type: application/json" -d '{"id": 1, "method": "debug_traceTransaction", "params": ["0xfc9359e49278b7ba99f59edac0e3de49956e46e530a53c15aa71226b7aa92c6f", {"disableStack": true, "disableStorage": true}]}' localhost:8545 -``` - -### Limits of basic traces - -Although the raw opcode traces generated above are useful, having an individual log entry for every single opcode is too low level for most use cases, and will require developers to create additional tools to post-process the traces. Additionally, a full opcode trace can easily go into the hundreds of megabytes, making them very resource intensive to get out of the node and process externally. - -To avoid those issues, Geth supports running custom JavaScript tracers _within_ the Ethereum node, which have full access to the EVM stack, memory and contract storage. This means developers only have to gather the data they actually need, and do any processing at the source. - -## Pruning - -Geth does in-memory state-pruning by default, discarding state entries that it deems no longer necessary to maintain. This is configured via the `--gcmode` command. An error message alerting the user that the necessary state is not available is common in EVM tracing on -anything other than an archive node. - -```sh -Error: required historical state unavailable (reexec=128) - at web3.js:6365:37(47) - at send (web3,js:5099:62(35)) - at :1:23(13) -``` - -The pruning behaviour, and consequently the state availability and tracing capability of a node depends on its sync and pruning configuration. The 'oldest' block after which state is immediately available, and before which state is not immediately available, is known as the "pivot block". There are then several possible cases for a trace request on a Geth node. - -For tracing a transaction in block `B` where the pivot block is `P` can regenerate the desired state by replaying blocks from the last: - -1. a fast-sync'd node can regenerate the desired state by replaying blocks from the most recent checkpoint between `P` and `B` as long as `P` < `B`. If `P` > `B` there is no available checkpoint and the state cannot be regenerated without replying the chain from genesis. - -2. a fully sync'd node can regenerate the desired state by replaying blocks from the last available full state before `B`. A fully sync'd node re-executes all blocks from genesis, so checkpoints are available across the entire history of the chain. However, database pruning discards older data, moving `P` to a more recent position in the chain. If `P` > `B` there is no available checkpoint and the state cannot be regenerated without replaying the chain from genesis. - -3. A fully-sync'd node without pruning (i.e. an archive node configured with `--gcmode=archive`) does not need to replay anything, it can immediately load up any state and serve the request for any `B`. - -The time taken to regenerate a specific state increases with the distance between `P` and `B`. If the distance between `P` and `B` is large, the regeneration time can be substantial. - -## Summary - -This page covered the concept of EVM tracing and how to generate traces with the default opcode-based tracers using RPC. More advanced usage is possible, including using other built-in tracers as well as writing [custom tracing](/docs/dapp/custom-tracer) code in Javascript and Go. The API as well as the JS tracing hooks are defined in [the reference](/docs/rpc/ns-debug#debug_traceTransaction). - -[evm]: