Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
* p2p/discover: add liveness check in collectTableNodes
* p2p/discover: fix test
* p2p/discover: rename to appendLiveNodes
* p2p/discover: add dedup logic back
* p2p/discover: simplify
* p2p/discover: fix issue found by test
This PR replaces Geth's logger package (a fork of [log15](https://github.com/inconshreveable/log15)) with an implementation using slog, a logging library included as part of the Go standard library as of Go1.21.
Main changes are as follows:
* removes any log handlers that were unused in the Geth codebase.
* Json, logfmt, and terminal formatters are now slog handlers.
* Verbosity level constants are changed to match slog constant values. Internal translation is done to make this opaque to the user and backwards compatible with existing `--verbosity` and `--vmodule` options.
* `--log.backtraceat` and `--log.debug` are removed.
The external-facing API is largely the same as the existing Geth logger. Logger method signatures remain unchanged.
A small semantic difference is that a `Handler` can only be set once per `Logger` and not changed dynamically. This just means that a new logger must be instantiated every time the handler of the root logger is changed.
----
For users of the `go-ethereum/log` module. If you were using this module for your own project, you will need to change the initialization. If you previously did
```golang
log.Root().SetHandler(log.LvlFilterHandler(log.LvlInfo, log.StreamHandler(os.Stderr, log.TerminalFormat(true))))
```
You now instead need to do
```golang
log.SetDefault(log.NewLogger(log.NewTerminalHandlerWithLevel(os.Stderr, log.LevelInfo, true)))
```
See more about reasoning here: https://github.com/ethereum/go-ethereum/issues/28558#issuecomment-1820606613
The Go authors updated golang/x/ext to change the function signature of the slices sort method.
It's an entire shitshow now because x/ext is not tagged, so everyone's codebase just
picked a new version that some other dep depends on, causing our code to fail building.
This PR updates the dep on our code too and does all the refactorings to follow upstream...
This changes TALKREQ message processing to run the handler on separate goroutine,
instead of running on the main discv5 dispatcher goroutine. It's better this way because
it allows the handler to perform blocking actions.
I'm also adding a new method TalkRequestToID here. The method allows implementing
a request flow where one node A sends TALKREQ to another node B, and node B later
sends a TALKREQ back. With TalkRequestToID, node B does not need the ENR of A to
send its request.
* p2p/discover: add more packet information in logs
This adds more fields to discv5 packet logs. These can be useful when
debugging multi-packet interactions.
The FINDNODE message also gets an additional field, OpID for debugging
purposes. This field is not encoded onto the wire.
I'm also removing topic system related message types in this change.
These will come back in the future, where support for them will be
guarded by a config flag.
* p2p/discover/v5wire: rename 'Total' to 'RespCount'
The new name captures the meaning of this field better.
Instead of using a limit of three nodes per message, we can pack more nodes
into each message based on ENR size. In my testing, this halves the number
of sent NODES messages, because ENR size is usually < 300 bytes.
This also adds RLP helper functions that compute the encoded size of
[]byte and string.
Co-authored-by: Martin Holst Swende <martin@swende.se>
Noticed that lookupDistances for FINDNODE requests didn't consider 256 a valid
distance. This is actually part of the example in the comment above the
function, surprised that wasn't tested before.
When receiving PING from an IPv4 address over IPv6, the implementation sent
back a IPv4-in-IPv6 address. This change makes it reflect the IPv4 address.
This PR implements the first one of the "lespay" UDP queries which
is already useful in itself: the capacity query. The server pool is making
use of this query by doing a cheap UDP query to determine whether it is
worth starting the more expensive TCP connection process.
This adds two new methods to UDPv5, AllNodes and LocalNode.
AllNodes returns all the nodes stored in the local table; this is
useful for the purposes of metrics collection and also debugging any
potential issues with other discovery v5 implementations.
LocalNode returns the local node object. The reason for exposing this
is so that users can modify and set/delete new key-value entries in
the local record.
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.