Commit Graph

38 Commits

Author SHA1 Message Date
Lucas
804d45cc2e
p2p: DNS resolution for static nodes (#30822)
Closes #23210 

# Context 
When deploying Geth in Kubernetes with ReplicaSets, we encountered two
DNS-related issues affecting node connectivity. First, during startup,
Geth tries to resolve DNS names for static nodes too early in the config
unmarshaling phase. If peer nodes aren't ready yet (which is common in
Kubernetes rolling deployments), this causes an immediate failure:


```
INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host
``` 

The second issue comes up when pods get rescheduled to different nodes -
their IPs change but peers keep using the initially resolved IP, never
updating the DNS mapping.

This PR adds proper DNS support for enode:// URLs by deferring resolution
to connection time. It also handles DNS failures gracefully instead of failing
fatally during startup, making it work better in container environments where
IPs are dynamic and peers come and go during rollouts.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 12:46:12 +01:00
Felix Lange
bc6569462d
p2p: use netip.Addr where possible (#29891)
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.

This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.

There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.
2024-06-05 19:31:04 +02:00
Felix Lange
758fce71fa
p2p: fix race in dialScheduler (#29235)
Co-authored-by: Stefan <stefan@starflinger.eu>
2024-03-12 19:23:24 +01:00
lightclient
cbf2579691
p2p, p2p/discover: add dial metrics (#27621)
This PR adds metrics for p2p dialing, which gives us visibility into the quality of the dial 
candidates  returned by our discovery methods.
2023-07-06 16:20:31 +02:00
Felix Lange
9e6a1c3834
common/mclock: add Alarm (#26333)
Alarm is a timer utility that simplifies code where a timer needs to be rescheduled over
and over. Doing this can be tricky with time.Timer or time.AfterFunc because the channel
requires draining in some cases.

Alarm is optimized for use cases where items are tracked in a heap according to their expiry
time, and a goroutine with a for/select loop wants to be woken up whenever the next item expires.
In this application, the timer needs to be rescheduled when an item is added or removed
from the heap. Using a timer naively, these updates will always require synchronization
with the global runtime timer datastructure to update the timer using Reset. Alarm avoids
this by tracking the next expiry time and only modifies the timer if it would need to fire earlier
than already scheduled.

As an example use, I have converted p2p.dialScheduler to use Alarm instead of AfterFunc.
2023-01-03 12:10:48 +01:00
Felix Lange
b628d72766
build: upgrade to go 1.19 (#25726)
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.

In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.

With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
2022-09-10 13:25:40 +02:00
Marius van der Wijden
8dbf261fd9
p2p, p2p/enode: fix data races (#23434)
In p2p/dial.go, conn.flags was accessed without using sync/atomic.
This race is fixed by removing the access.

In p2p/enode/iter_test.go, a similar race is resolved by writing the field atomically.

Co-authored-by: Felix Lange <fjl@twurst.com>
2021-08-24 12:22:56 +02:00
baptiste-b-pegasys
860184d542
p2p: remove term "whitelist" (#23295)
Co-authored-by: Felix Lange <fjl@twurst.com>
2021-07-29 17:50:18 +02:00
Felix Lange
6f54ae24cd
p2p: add 0 port check in dialer (#21008)
* p2p: add low port check in dialer

We already have a check like this for UDP ports, add a similar one in
the dialer. This prevents dials to port zero and it's also an extra
layer of protection against spamming HTTP servers.

* p2p/discover: use errLowPort in v4 code

* p2p: change port check

* p2p: add comment

* p2p/simulations/adapters: ensure assigned port is in all node records
2020-05-11 18:11:17 +03:00
Felix Lange
90caa2cabb
p2p: new dial scheduler (#20592)
* p2p: new dial scheduler

This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:

- The time between discovery of a node and dialing that node is
  significantly lower in the new version. The old dialState kept
  a buffer of nodes and launched a task to refill it whenever the buffer
  became empty. This worked well with the discovery interface we used to
  have, but doesn't really work with the new iterator-based discovery
  API.

- Selection of static dial candidates (created by Server.AddPeer or
  through static-nodes.json) performs much better for large amounts of
  static peers. Connections to static nodes are now limited like dynanic
  dials and can no longer overstep MaxPeers or the dial ratio.

* p2p/simulations/adapters: adapt to new NodeDialer interface

* p2p: re-add check for self in checkDial

* p2p: remove peersetCh

* p2p: allow static dials when discovery is disabled

* p2p: add test for dialScheduler.removeStatic

* p2p: remove blank line

* p2p: fix documentation of maxDialPeers

* p2p: change "ok" to "added" in static node log

* p2p: improve dialTask docs

Also increase log level for "Can't resolve node"

* p2p: ensure dial resolver is truly nil without discovery

* p2p: add "looking for peers" log message

* p2p: clean up Server.run comments

* p2p: fix maxDialedConns for maxpeers < dialRatio

Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.

* p2p: fix RemovePeer to disconnect the peer again

Also make RemovePeer synchronous and add a test.

* p2p: remove "Connection set up" log message

* p2p: clean up connection logging

We previously logged outgoing connection failures up to three times.

- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."

This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).

Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.

* p2p: document that RemovePeer blocks
2020-02-13 11:10:03 +01:00
Kurkó Mihály
4ea9b62b5c dashboard: send current block to the dashboard client (#19762)
This adds all dashboard changes from the last couple months.
We're about to remove the dashboard, but decided that we should
get all the recent work in first in case anyone wants to pick up this
project later on.

* cmd, dashboard, eth, p2p: send peer info to the dashboard
* dashboard: update npm packages, improve UI, rebase
* dashboard, p2p: remove println, change doc
* cmd, dashboard, eth, p2p: cleanup after review
* dashboard: send current block to the dashboard client
2019-11-13 12:13:13 +01:00
Felix Lange
2c37142d2f cmd/devp2p, p2p: dial using node iterator, discovery crawler (#20132)
* p2p/enode: add Iterator and associated utilities

* p2p/discover: add RandomNodes iterator

* p2p: dial using iterator

* cmd/devp2p: add discv4 crawler

* cmd/devp2p: WIP nodeset filter

* cmd/devp2p: fixup lesFilter

* core/forkid: add NewStaticFilter

* cmd/devp2p: make -eth-network filter actually work

* cmd/devp2p: improve crawl timestamp handling

* cmd/devp2p: fix typo

* p2p/enode: fix comment typos

* p2p/discover: fix comment typos

* p2p/discover: rename lookup.next to 'advance'

* p2p: lower discovery mixer timeout

* p2p/enode: implement dynamic FairMix timeouts

* cmd/devp2p: add ropsten support in -eth-network filter

* cmd/devp2p: tweak crawler log message
2019-10-29 17:08:57 +02:00
Felix Lange
c420dcb39c
p2p: enforce connection retry limit on server side (#19684)
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.

Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
2019-06-11 12:45:33 +02:00
Guillaume Ballet
29bba5d0b2 p2p: fix typo in dialstate comment (#19476) 2019-04-18 10:02:11 +03:00
Kurkó Mihály
16e4d0e005 p2p: meter peer traffic, emit metered peer events (#17695)
This change extends the peer metrics collection:

- traces the life-cycle of the peers
- meters the peer traffic separately for every peer
- creates event feed for the peer events
- emits the peer events
2018-10-16 00:40:51 +02:00
Felix Lange
6f607de5d5
p2p, p2p/discover: add signed ENR generation (#17753)
This PR adds enode.LocalNode and integrates it into the p2p
subsystem. This new object is the keeper of the local node
record. For now, a new version of the record is produced every
time the client restarts. We'll make it smarter to avoid that in
the future.

There are a couple of other changes in this commit: discovery now
waits for all of its goroutines at shutdown and the p2p server
now closes the node database after discovery has shut down. This
fixes a leveldb crash in tests. p2p server startup is faster
because it doesn't need to wait for the external IP query
anymore.
2018-10-12 11:47:24 +02:00
Felix Lange
30cd5c1854
all: new p2p node representation (#17643)
Package p2p/enode provides a generalized representation of p2p nodes
which can contain arbitrary information in key/value pairs. It is also
the new home for the node database. The "v4" identity scheme is also
moved here from p2p/enr to remove the dependency on Ethereum crypto from
that package.

Record signature handling is changed significantly. The identity scheme
registry is removed and acceptable schemes must be passed to any method
that needs identity. This means records must now be validated explicitly
after decoding.

The enode API is designed to make signature handling easy and safe: most
APIs around the codebase work with enode.Node, which is a wrapper around
a valid record. Going from enr.Record to enode.Node requires a valid
signature.

* p2p/discover: port to p2p/enode

This ports the discovery code to the new node representation in
p2p/enode. The wire protocol is unchanged, this can be considered a
refactoring change. The Kademlia table can now deal with nodes using an
arbitrary identity scheme. This requires a few incompatible API changes:

  - Table.Lookup is not available anymore. It used to take a public key
    as argument because v4 protocol requires one. Its replacement is
    LookupRandom.
  - Table.Resolve takes *enode.Node instead of NodeID. This is also for
    v4 protocol compatibility because nodes cannot be looked up by ID
    alone.
  - Types Node and NodeID are gone. Further commits in the series will be
    fixes all over the the codebase to deal with those removals.

* p2p: port to p2p/enode and discovery changes

This adapts package p2p to the changes in p2p/discover. All uses of
discover.Node and discover.NodeID are replaced by their equivalents from
p2p/enode.

New API is added to retrieve the enode.Node instance of a peer. The
behavior of Server.Self with discovery disabled is improved. It now
tries much harder to report a working IP address, falling back to
127.0.0.1 if no suitable address can be determined through other means.
These changes were needed for tests of other packages later in the
series.

* p2p/simulations, p2p/testing: port to p2p/enode

No surprises here, mostly replacements of discover.Node, discover.NodeID
with their new equivalents. The 'interesting' API changes are:

 - testing.ProtocolSession tracks complete nodes, not just their IDs.
 - adapters.NodeConfig has a new method to create a complete node.

These changes were needed to make swarm tests work.

Note that the NodeID change makes the code incompatible with old
simulation snapshots.

* whisper/whisperv5, whisper/whisperv6: port to p2p/enode

This port was easy because whisper uses []byte for node IDs and
URL strings in the API.

* eth: port to p2p/enode

Again, easy to port because eth uses strings for node IDs and doesn't
care about node information in any way.

* les: port to p2p/enode

Apart from replacing discover.NodeID with enode.ID, most changes are in
the server pool code. It now deals with complete nodes instead
of (Pubkey, IP, Port) triples. The database format is unchanged for now,
but we should probably change it to use the node database later.

* node: port to p2p/enode

This change simply replaces discover.Node and discover.NodeID with their
new equivalents.

* swarm/network: port to p2p/enode

Swarm has its own node address representation, BzzAddr, containing both
an overlay address (the hash of a secp256k1 public key) and an underlay
address (enode:// URL).

There are no changes to the BzzAddr format in this commit, but certain
operations such as creating a BzzAddr from a node ID are now impossible
because node IDs aren't public keys anymore.

Most swarm-related changes in the series remove uses of
NewAddrFromNodeID, replacing it with NewAddr which takes a complete node
as argument. ToOverlayAddr is removed because we can just use the node
ID directly.
2018-09-25 00:59:00 +02:00
Mymskmkt
1df1187d83 p2p: fix comment typo (#17491) 2018-08-23 11:47:43 +03:00
Dmitry Shulyak
14c76371ba p2p: when peer is removed remove it also from dial history (#16060)
This change removes a peer information from dialing history
when peer is removed from static list. It allows to force a
server to re-dial concrete peer if it is needed.

In our case we are running geth node on mobile devices, and
it is common for a network connection to flap on mobile.
Almost every time it flaps or network connection is changed
from cellular to wifi peers are disconnected with read
timeout. And usually it takes 30 seconds (default expiration
timeout) to recover connection with static peers after
connectivity is restored.

This change allows us to reconnect with peers almost
immediately and it seems harmless enough.
2018-02-21 15:03:26 +01:00
ferhat elmas
1d06e41f04 p2p, swarm/network/kademlia: use IsZero to check for zero time (#15603) 2017-12-04 11:07:10 +01:00
Lewis Marshall
54aeb8e4c0 p2p/simulations: various stability fixes (#15198)
p2p/simulations: introduce dialBan

- Refactor simulations/network connection getters to support
  avoiding simultaneous dials between two peers If two peers dial
  simultaneously, the connection will be dropped to help avoid
  that, we essentially lock the connection object with a
  timestamp which serves as a ban on dialing for a period of time
  (dialBanTimeout).

- The connection getter InitConn can be wrapped and passed to the
  nodes via adapters.NodeConfig#Reachable field and then used by
  the respective services when they initiate connections. This
  massively stablise the emerging connectivity when running with
  hundreds of nodes bootstrapping a network.

p2p: add Inbound public method to p2p.Peer

p2p/simulations: Add server id to logs to support debugging
in-memory network simulations when multiple peers are logging.

p2p: SetupConn now returns error. The dialer checks the error and
only calls resolve if the actual TCP dial fails.
2017-12-01 12:49:04 +01:00
Lewis Marshall
9feec51e2d p2p: add network simulation framework (#14982)
This commit introduces a network simulation framework which
can be used to run simulated networks of devp2p nodes. The
intention is to use this for testing protocols, performing
benchmarks and visualising emergent network behaviour.
2017-09-25 10:08:07 +02:00
Péter Szilágyi
04fcae207d p2p: if no nodes are connected, attempt dialing bootnodes (#13874) 2017-04-10 18:33:41 +02:00
Felix Lange
96ae35e2ac p2p, p2p/discover, p2p/nat: rework logging using context keys 2017-02-28 10:20:29 +01:00
Péter Szilágyi
d4fd06c3dc
all: blidly swap out glog to our log15, logs need rework 2017-02-23 12:16:44 +02:00
Péter Szilágyi
189dee26c6
p2p: remove trailing newlines from log messages 2017-02-23 12:00:04 +02:00
Felix Lange
a47341cf96 p2p, p2p/discover, p2p/discv5: add IP network restriction feature
The p2p packages can now be configured to restrict all communication to
a certain subset of IP networks. This feature is meant to be used for
private networks.
2016-11-22 22:21:18 +01:00
Firescar96
4c3da0f2e1 node, p2p, internal: Add ability to remove peers via admin interface 2016-07-14 18:51:41 -04:00
Felix Lange
6c41e675ec p2p: resolve incomplete dial targets
This change makes it possible to add peers without providing their IP
address. The endpoint of the target node is resolved using the discovery
protocol.
2015-12-17 23:39:49 +01:00
Felix Lange
04c6369a09 p2p, p2p/discover: track bootstrap state in p2p/discover
This change simplifies the dial scheduling logic because it
no longer needs to track whether the discovery table has been
bootstrapped.
2015-12-17 23:38:54 +01:00
Jeffrey Wilcke
269c5c7107 Revert "fdtrack: temporary hack for tracking file descriptor usage"
This reverts commit 5c949d3b3ba81ea0563575b19a7b148aeac4bf61.
2015-08-19 21:46:01 +02:00
Felix Lange
5c949d3b3b fdtrack: temporary hack for tracking file descriptor usage
Package fdtrack logs statistics about open file descriptors.
This should help identify the source of #1549.
2015-08-04 03:10:27 +02:00
Felix Lange
bfbcfbe4a9 all: fix license headers one more time
I forgot to update one instance of "go-ethereum" in commit 3f047be5a.
2015-07-23 18:35:11 +02:00
Felix Lange
3f047be5aa all: update license headers to distiguish GPL/LGPL
All code outside of cmd/ is licensed as LGPL. The headers
now reflect this by calling the whole work "the go-ethereum library".
2015-07-22 18:51:45 +02:00
Felix Lange
ea54283b30 all: update license information 2015-07-07 14:12:44 +02:00
Péter Szilágyi
6994a3daaa p2p: instrument P2P networking layer 2015-06-24 18:33:33 +03:00
Felix Lange
6fb810adaa p2p: throttle all discovery lookups
Lookup calls would spin out of control when network connectivity was
lost. The throttling that was in place only took effect when the table
returned zero results, which doesn't happen very often.

The new throttling should not have a negative impact when the host is
online. Lookups against the network take some time and dials for all
results must complete or hit the cache before a new one is started. This
usually takes longer than four seconds, leaving online lookups
unaffected.

Fixes #1296
2015-06-22 01:07:58 +02:00
Felix Lange
1440f9a37a p2p: new dialer, peer management without locks
The most visible change is event-based dialing, which should be an
improvement over the timer-based system that we have at the moment.
The dialer gets a chance to compute new tasks whenever peers change or
dials complete. This is better than checking peers on a timer because
dials happen faster. The dialer can now make more precise decisions
about whom to dial based on the peer set and we can test those
decisions without actually opening any sockets.

Peer management is easier to test because the tests can inject
connections at checkpoints (after enc handshake, after protocol
handshake).

Most of the handshake stuff is now part of the RLPx code. It could be
exported or move to its own package because it is no longer entangled
with Server logic.
2015-05-25 01:17:14 +02:00