From 4962b5a7ce4a73e404fe6eee57676eedb4ea3096 Mon Sep 17 00:00:00 2001
From: Joseph Cook <33655003+jmcook1186@users.noreply.github.com>
Date: Fri, 2 Sep 2022 13:19:19 +0100
Subject: [PATCH] docs: add page on pruning a geth node (#25602)

Adds a page with brief instructions for pruning a geth node.
Also intended for use on new site.
---
 docs/_interface/pruning.md | 91 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)
 create mode 100644 docs/_interface/pruning.md

diff --git a/docs/_interface/pruning.md b/docs/_interface/pruning.md
new file mode 100644
index 0000000000..22792d9037
--- /dev/null
+++ b/docs/_interface/pruning.md
@@ -0,0 +1,91 @@
+---
+title: Pruning
+sort key: F
+---
+
+
+{% include note.html content="Offline pruning is only for the hash-based state scheme. 
+Soon, we will have a path-based state scheme which enables the pruning by default.
+Once the hash-based state scheme is no longer supported, offline pruning will be deprecated." %}
+
+
+A snap-sync'd Geth node currently requires more than 650 GB of disk space to store the 
+historic blockchain data. With default cache size the database grows by about 14 GB/week. 
+This means that Geth users will rapidly run out of space on 1TB hard drives. To solve this 
+problem without needing to purchase additional hardware, Geth can be pruned. Pruning is the 
+process of erasing older data to save disk space. Since Geth `v1.10`, users have been able 
+to trigger a snapshot offline prune to bring the total storage back down to the original 
+~650 GB in about 4-5 hours. This has to be done periodically to keep the total disk storage 
+within the bounds of the local hardware (e.g. every month or so for a 1TB disk).
+
+To prune a Geth node at least 40 GB of free disk space is recommended. This means pruning 
+cannot be used to save a hard drive that has been completely filled. A good rule of thumb 
+is to prune before the node fills ~80% of the available disk space.
+
+## Pruning rules
+
+1) Do not try to prune an archive node. Archive nodes need to maintain ALL historic data by 
+   definition.
+2) Ensure there is at least 40 GB of storage space still available on the disk that will be 
+   pruned. Failures have been reported with ~25GB of free space.
+3) Geth is at least `v1.10` ideally > `v1.10.3`
+4) Geth is fully sync'd
+5) Geth has finished creating a snapshot that is at least 128 blocks old. This is true when 
+   "state snapshot generation" is no longer reported in the logs.
+
+With these rules satisfied, Geth's database can be pruned.
+
+## How pruning works
+
+Pruning uses snapshots of the state database as an indicator to determine which 
+nodes in the state trie can be kept and which ones are stale and can be discarded. Geth 
+identifies the target state trie based on a stored snapshot layer which has at least 128 block confirmations on top(for surviving reorgs),
+discarding any data that isn't part of the target state trie or genesis state. 
+
+Geth prunes the database in three stages:
+
+1) Iterating state snapshot: Geth iterates the bottom-most snapshot layer and constructs a bloom filter set for identifying the target trie nodes.
+2) Pruning state data: Geth deletes stale trie nodes from the database which are not in the bloom filter set.
+3) Compacting database: Geth tidies up the new database to reclaim free space.
+
+There may be a period of >1 hour during the Compacting Database stage with no log messages at all. 
+This is normal, and the pruning should be left to run until finally a log message containing the 
+phrase `State pruning successful` appears (i.e. do not restart Geth yet!). That message indicates 
+that the pruning is complete and Geth can be started.
+
+## Pruning command
+
+For a normal Geth node, Geth should be stopped and the following command executed to start a 
+offline state prune:
+
+```sh
+geth snapshot prune-state
+```
+
+For a Geth node run using `systemd`:
+
+```sh
+sudo systemctl stop geth # stop geth, wait >3mins to ensure clean shutdown
+tmux # tmux enables pruning to keep running even if you disconnect
+sudo -u <user> geth --datadir <path> snapshot prune-state # wait for pruning to finish
+sudo systemctl start geth # restart geth
+```
+
+The pruning could take 4-5 hours to complete. Once finished, restart Geth.
+
+
+## Troubleshooting
+
+Messages about "state snapshot generation" indicate that a snapshot is not fully generated. 
+This suggests either the `--datadir` is not correct or Geth ran out of time to complete the 
+snapshot generation and the pruning began before the snapshot was completed. In either case, 
+the best course of action is to stop Geth, run it normally again (no pruning) until the snapshot 
+is definitely complete and at least 128 blocks exist on top of it, then try pruning again.
+
+## Further Reading
+
+[Ethereum Foundation blog post for Geth v1.10.0](https://blog.ethereum.org/2021/03/03/geth-v1-10-0/)
+ 
+[Pruning Geth guide (@yorickdowne)](https://gist.github.com/yorickdowne/3323759b4cbf2022e191ab058a4276b2)
+ 
+[Pruning Geth in a RocketPool node](https://docs.rocketpool.net/guides/node/geth-pruning.html)