docs: metrics page (#25002)

* update metrics page

* fix sort key

* fix typos

* minor fixes

Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
pull/25047/head
Joseph Cook 2 years ago committed by GitHub
parent 3e629777c1
commit d6eceee791
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 162
      docs/_interface/metrics.md

@ -1,47 +1,107 @@
---
title: Metrics
sort_key: C
sort_key: G
---
## Meters and Timers
Note, metrics collection is disabled by default in order not to incur reporting overhead for the average user. The flag `--metrics` must therefore be used to enable the basic metrics, and the flag `--metrics.expensive` can be used to enable certain metrics that are deemed 'expensive', from a resource-consumption perspective. Examples of expensive metrics is per-packet network traffic data.
Geth includes a variety of optional metrics that can be reported to the user. However, metrics are disabled by default to save on the computational overhead for the average user. Users that choose to see more detailed metrics can enable them using the `--metrics` flag when starting Geth. Some metrics are classed as especially expensive and are only enabled when the `--metrics.expensive` flag is supplied. For example, per-packet network traffic data is considered expensive.
The goal of the Geth metrics system is that - similar to logs - we should be able to add arbitrary metric collection to any part of the code without requiring fancy constructs to analyze them (counter variables, public interfaces, crossing over the APIs, console hooks, etc). Instead, we should just "update" metrics whenever and wherever needed, and have them automatically collected, surfaced through the APIs, queryable and visualizable for analysis.
The goal of the Geth metrics system is that - similar to logs - arbitrary metric collections can be added to any part of the code without requiring fancy constructs to analyze them (counter variables, public interfaces, crossing over the APIs, console hooks, etc). Instead, metrics should be "updated" whenever and wherever needed and be automatically collected, surfaced through the APIs, queryable and visualizable for analysis.
To that extent, Geth currently implement two types of metrics:
* **Meters**: Analogous to physical meters (electricity, water, etc), they are capable of measuring the *amount* of "things" that pass through and at the *rate* at which they do that. A meter doesn't have a specific unit of measure (byte, block, malloc, etc), it just counts arbitrary *events*. At any point in time it can report:
* *Total number of events* that passed through the meter
* *Mean throughput rate* of the meter since startup (events / second)
* *Weighted throughput rate* in the last *1*, *5* and *15* minutes (events / second)
* (*"weighted" means that recent seconds count more that in older ones*)
* **Timers**: Extension of *meters*, where not only the occurrence of some event is measured, its *duration* is also collected. Similarly to meters, a timer can also measure arbitrary events, but each requires a duration to be assigned individually. Beside **all** the reports a meter can generate, a timer has additionally:
* *Percentiles (5, 20, 50, 80, 95)*, reporting that some percentage of the events took less than the reported time to execute (*e.g. Percentile 20 = 1.5s would mean that 20% of the measured events took less time than 1.5 seconds to execute; inherently 80%(=100%-20%) took more that 1.5s*)
* Percentile 5: minimum durations (this is as fast as it gets)
* Percentile 50: well behaved samples (boring, just to give an idea)
* Percentile 80: general performance (these should be optimised)
* Percentile 95: worst case outliers (rare, just handle gracefully)
* **Counters**: A counter holds a single int64 value that can be incremented and decremented. The current value of the counter can be queried.
* **Gauges**: A gauge measures a single int64 value. Additionally to increment and decrement the value, as with a counter, the gauge can be set arbitrarely.
## Metric types
Geth's metrics can be classified into four types: meters, timers, counters and guages.
### Meters
Analogous to physical meters (electricity, water, etc), Geth's meters are capable of measuring the *amount* of "things" that pass through and at the *rate* at which they do. A meter doesn't have a specific unit of measure (byte, block, malloc, etc), it just counts arbitrary *events*. At any point in time a meter can report:
* *Total number of events* that passed through the meter
* *Mean throughput rate* of the meter since startup (events / second)
* *Weighted throughput rate* in the last *1*, *5* and *15* minutes (events / second)
("weighted" means that recent seconds count more that in older ones*)
### Timers
Timers are extensions of *meters*, the *duration* of an event is collected alongside a log of its occurrence. Similarly to meters, a timer can also measure arbitrary events but each requires a duration to be assigned individually. In addition generating all of the meter report types, a timer also reports:
* *Percentiles (5, 20, 50, 80, 95)*, reporting that some percentage of the events took less than the reported time to execute (*e.g. Percentile 20 = 1.5s would mean that 20% of the measured events took less time than 1.5 seconds to execute; inherently 80%(=100%-20%) took more that 1.5s*)
* Percentile 5: minimum durations (this is as fast as it gets)
* Percentile 50: well behaved samples (boring, just to give an idea)
* Percentile 80: general performance (these should be optimised)
* Percentile 95: worst case outliers (rare, just handle gracefully)
### Counters:
A counter is a single int64 value that can be incremented and decremented. The current value of the counter can be queried.
### Gauges:
A gauge is a single int64 value. Its value can increment and decrement - as with a counter - but can also be set arbitrarily.
## Querying metrics
Geth collects metrics if the `--metrics` flag is provided at startup. Those metrics are available via an HTTP server if the `--metrics.addr` flag is also provided. By default the metrics are served at `127.0.0.1:6060/debug/metrics` but a custom IP address can be provided. A custom port can also be provided to the `--metrics.port` flag. More computationally expensive metrics are toggled on or off by providing or omitting the `--metrics.expensive` flag. For example, to serve all metrics at the default address and port:
```
geth <other commands> --metrics --metrics.addr 127.0.0.1 --metrics.expensive
```
Navigating the browser to the given metrics address displays all the available metrics in the form
of JSON data that looks similar to:
```
chain/account/commits.50-percentile: 374072
chain/account/commits.75-percentile: 830356
chain/account/commits.95-percentile: 1783005.3999976
chain/account/commits.99-percentile: 3991806
chain/account/commits.99.999-percentile: 3991806
chain/account/commits.count: 43
chain/account/commits.fifteen-minute: 0.029134344092314267
chain/account/commits.five-minute: 0.029134344092314267
...
```
Any developer is free to add, remove or modify the available metrics as they see fit. The precise list of available metrics is always available by opening the metrics server in the browser.
Geth also supports dumping metrics directly into an influx database. In order to activate this, the `--metrics.influxdb` flag must be provided at startup. The API endpoint,username, password and other influxdb tags can also be provided. The available tags are:
```
--metrics.influxdb.endpoint value InfluxDB API endpoint to report metrics to (default: "http://localhost:8086")
--metrics.influxdb.database value InfluxDB database name to push reported metrics to (default: "geth")
--metrics.influxdb.username value Username to authorize access to the database (default: "test")
--metrics.influxdb.password value Password to authorize access to the database (default: "test")
--metrics.influxdb.tags value Comma-separated InfluxDB tags (key/values) attached to all measurements (default: "host=localhost")
--metrics.influxdbv2 Enable metrics export/push to an external InfluxDB v2 database
--metrics.influxdb.token value Token to authorize access to the database (v2 only) (default: "test")
--metrics.influxdb.bucket value InfluxDB bucket name to push reported metrics to (v2 only) (default: "geth")
--metrics.influxdb.organization value InfluxDB organization name (v2 only) (default: "geth")
```
## Creating and updating metrics
Metrics can be added easily in the code:
Metrics can be added easily in the Geth source code:
```go
meter := metrics.NewMeter("system/memory/allocs")
timer := metrics.NewTimer("chain/inserts")
```
In order to use the same meter from two different packages without creating dependency cycles, the metrics can be created using `NewOrRegisteredX()` functions.
This creates a new meter if no meter with this name is available or returns the existing meter.
In order to use the same meter from two different packages without creating dependency cycles, the metrics can be created using `NewOrRegisteredX()` functions. This creates a new meter if no meter with this name is available or returns the existing meter.
```go
meter := metrics.NewOrRegisteredMeter("system/memory/allocs")
timer := metrics.NewOrRegisteredTimer("chain/inserts")
```
The name can be any arbitrary string, however since Geth assumes it to be some meaningful sub-system hierarchy, please name accordingly. Metrics can then be updated equally simply:
The name given to the metric can be any arbitrary string. However, since Geth assumes it to be some meaningful sub-system hierarchy, it should be named accordingly.
Metrics can then be updated:
```go
meter.Mark(n) // Record the occurrence of `n` events
@ -51,60 +111,6 @@ timer.UpdateSince(time) // Record an event that started at `time`
timer.Time(function) // Measure and record the execution of `function`
```
## Querying metrics
Geth exposes all collected metrics at `127.0.0.1:6060/debug/metrics`.
For collecting metrics you need to add the `--metrics` flag. In order to start the metric server you need to specify the `--metrics.addr` flag.
Geth also supports dumping metrics directly into an influx database. In order to activate this, you need to specify the `--metrics.influxdb` flag. You can specify the API endpoint as well as password and username and other influxdb tags.
## Summary
## Available metrics
Metrics are a debugging tool, with every developer being free to add, remove or modify them as seen fit. As they can change between commits, the exactly available ones can be queried by opening `127.0.0.1:6060/debug/metrics` in your browser. A few however may warrant longevity, so feel free to add to the below list if you feel it's worth a more general audience:
```
* system/
* memory/
* allocs: number of memory allocations made
* used: amount of memory currently used
* held: memory allocated on the heap
* pauses: garbage collector pauses
* frees: number of memory allocations freed
* cpu/
* sysload: time spent by CPU on all processes
* syswait: time spent waiting on disk i/o
* procload: time spent by CPU on this process
* threads: number of threads
* goroutines: number of goroutines
* disk/
* readcount: number of read operations
* readdata: total number of bytes read
* readbytes: counter of bytes read
* writecount: number of write operations
* writedata: total number of bytes written
* writebytes: counter of bytes written
* rpc/
* requests: number of requests
* success: number of successful requests
* failure: number of failed requests
* chain/
* reorg/
* drop: blocks dropped by reorg
* add: blocks added by reorg
* head/
* block: currently newest block
* header: currently newest header
* receipt: currently newest receipt
* p2p/
* peers: number of connected peers
* ingress: inbound traffic in bytes
* egress: outbound traffic in bytes
* txpool/
* pending: currently pending transactions
* local: number of transactions send from this node
```
Geth can be configured to report metrics to an HTTP server or database. These functions are disabled by default but can be configured by passing the appropriate commands on startup. Users can easily create custom metrics by adding them to the Geth source code, following the instructions on this page.

Loading…
Cancel
Save