From 4962b5a7ce4a73e404fe6eee57676eedb4ea3096 Mon Sep 17 00:00:00 2001 From: Joseph Cook <33655003+jmcook1186@users.noreply.github.com> Date: Fri, 2 Sep 2022 13:19:19 +0100 Subject: [PATCH] docs: add page on pruning a geth node (#25602) Adds a page with brief instructions for pruning a geth node. Also intended for use on new site. --- docs/_interface/pruning.md | 91 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 docs/_interface/pruning.md diff --git a/docs/_interface/pruning.md b/docs/_interface/pruning.md new file mode 100644 index 0000000000..22792d9037 --- /dev/null +++ b/docs/_interface/pruning.md @@ -0,0 +1,91 @@ +--- +title: Pruning +sort key: F +--- + + +{% include note.html content="Offline pruning is only for the hash-based state scheme. +Soon, we will have a path-based state scheme which enables the pruning by default. +Once the hash-based state scheme is no longer supported, offline pruning will be deprecated." %} + + +A snap-sync'd Geth node currently requires more than 650 GB of disk space to store the +historic blockchain data. With default cache size the database grows by about 14 GB/week. +This means that Geth users will rapidly run out of space on 1TB hard drives. To solve this +problem without needing to purchase additional hardware, Geth can be pruned. Pruning is the +process of erasing older data to save disk space. Since Geth `v1.10`, users have been able +to trigger a snapshot offline prune to bring the total storage back down to the original +~650 GB in about 4-5 hours. This has to be done periodically to keep the total disk storage +within the bounds of the local hardware (e.g. every month or so for a 1TB disk). + +To prune a Geth node at least 40 GB of free disk space is recommended. This means pruning +cannot be used to save a hard drive that has been completely filled. A good rule of thumb +is to prune before the node fills ~80% of the available disk space. + +## Pruning rules + +1) Do not try to prune an archive node. Archive nodes need to maintain ALL historic data by + definition. +2) Ensure there is at least 40 GB of storage space still available on the disk that will be + pruned. Failures have been reported with ~25GB of free space. +3) Geth is at least `v1.10` ideally > `v1.10.3` +4) Geth is fully sync'd +5) Geth has finished creating a snapshot that is at least 128 blocks old. This is true when + "state snapshot generation" is no longer reported in the logs. + +With these rules satisfied, Geth's database can be pruned. + +## How pruning works + +Pruning uses snapshots of the state database as an indicator to determine which +nodes in the state trie can be kept and which ones are stale and can be discarded. Geth +identifies the target state trie based on a stored snapshot layer which has at least 128 block confirmations on top(for surviving reorgs), +discarding any data that isn't part of the target state trie or genesis state. + +Geth prunes the database in three stages: + +1) Iterating state snapshot: Geth iterates the bottom-most snapshot layer and constructs a bloom filter set for identifying the target trie nodes. +2) Pruning state data: Geth deletes stale trie nodes from the database which are not in the bloom filter set. +3) Compacting database: Geth tidies up the new database to reclaim free space. + +There may be a period of >1 hour during the Compacting Database stage with no log messages at all. +This is normal, and the pruning should be left to run until finally a log message containing the +phrase `State pruning successful` appears (i.e. do not restart Geth yet!). That message indicates +that the pruning is complete and Geth can be started. + +## Pruning command + +For a normal Geth node, Geth should be stopped and the following command executed to start a +offline state prune: + +```sh +geth snapshot prune-state +``` + +For a Geth node run using `systemd`: + +```sh +sudo systemctl stop geth # stop geth, wait >3mins to ensure clean shutdown +tmux # tmux enables pruning to keep running even if you disconnect +sudo -u geth --datadir snapshot prune-state # wait for pruning to finish +sudo systemctl start geth # restart geth +``` + +The pruning could take 4-5 hours to complete. Once finished, restart Geth. + + +## Troubleshooting + +Messages about "state snapshot generation" indicate that a snapshot is not fully generated. +This suggests either the `--datadir` is not correct or Geth ran out of time to complete the +snapshot generation and the pruning began before the snapshot was completed. In either case, +the best course of action is to stop Geth, run it normally again (no pruning) until the snapshot +is definitely complete and at least 128 blocks exist on top of it, then try pruning again. + +## Further Reading + +[Ethereum Foundation blog post for Geth v1.10.0](https://blog.ethereum.org/2021/03/03/geth-v1-10-0/) + +[Pruning Geth guide (@yorickdowne)](https://gist.github.com/yorickdowne/3323759b4cbf2022e191ab058a4276b2) + +[Pruning Geth in a RocketPool node](https://docs.rocketpool.net/guides/node/geth-pruning.html)