In the random sync algorithm used by the DNS node iterator, we first pick a random
tree and then perform one sync action on that tree. This happens in a loop until any
node is found. If no trees contain any nodes, the iterator will enter a hot loop spinning
at 100% CPU.
The fix is complicated. The iterator now checks if a meaningful sync action can
be performed on any tree. If there is nothing to do, it waits for the next root record
recheck time to arrive and then tries again.
Fixes#22306
Prevents a situation where we (not running snap) connects with a peer running snap, and get stalled waiting for snap registration to succeed (which will never happen), which cause a waitgroup wait to halt shutdown
This PR enables running the new discv5 protocol in both LES client
and server mode. In client mode it mixes discv5 and dnsdisc iterators
(if both are enabled) and filters incoming ENRs for "les" tag and fork ID.
The old p2p/discv5 package and all references to it are removed.
Co-authored-by: Felix Lange <fjl@twurst.com>
USB enumeration still occured. Make sure it will only occur if --usb is set.
This also deprecates the 'NoUSB' config file option in favor of a new option 'USB'.
The database panicked for invalid IPs. This is usually no problem
because all code paths leading to node DB access verify the IP, but it's
dangerous because improper validation can turn this panic into a DoS
vulnerability. The quick fix here is to just turn database accesses
using invalid IP into a noop. This isn't great, but I'm planning to
remove the node DB for discv5 long-term, so it should be fine to have
this quick fix for half a year.
Fixes#21849
This PR fixes a deadlock reported here: #21925
The cause is that many operations may be pending, but if the close happens, only one of them gets awoken and exits, the others remain waiting for a signal that never comes.
This fixes a deadlock that could occur when a response packet arrived
after a call had already received enough responses and was about to
signal completion to the dispatch loop.
Co-authored-by: Felix Lange <fjl@twurst.com>
- Remove the ws:// prefix from the status endpoint since
the ws:// is already included in the stack.WSEndpoint().
- Don't register the services again in the node start.
Registration is already done in the initialization stage.
- Expose admin namespace via websocket.
This namespace is necessary for connecting the peers via websocket.
- Offer logging relevant options for exec adapter.
It's really painful to mix all log output in the single console. So
this PR offers two additional options for exec adapter in this case
testers can config the log output(e.g. file output) and log level
for each p2p node.
This adds a few tiny fixes for les and the p2p simulation framework:
LES Parts
- Keep the LES-SERVER connection even it's non-synced
We had this idea to reject the connections in LES protocol if the les-server itself is
not synced. However, in LES protocol we will also receive the connection from another
les-server. In this case even the local node is not synced yet, we should keep the tcp
connection for other protocols(e.g. eth protocol).
- Don't count "invalid message" for non-existing GetBlockHeadersMsg request
In the eth syncing mechanism (full sync, fast sync, light sync), it will try to fetch
some non-existent blocks or headers(to ensure we indeed download all the missing chain).
In this case, it's possible that the les-server will receive the request for
non-existent headers. So don't count it as the "invalid message" for scheduling
dropping.
- Copy the announce object in the closure
Before the les-server pushes the latest headers to all connected clients, it will create
a closure and queue it in the underlying request scheduler. In some scenarios it's
problematic. E.g, in private networks, the block can be mined very fast. So before the
first closure is executed, we may already update the latest_announce object. So actually
the "announce" object we want to send is replaced.
The downsize is the client will receive two announces with the same td and then drop the
server.
P2P Simulation Framework
- Don't double register the protocol services in p2p-simulation "Start".
The protocols upon the devp2p are registered in the "New node stage". So don't reigster
them again when starting a node in the p2p simulation framework
- Add one more new config field "ExternalSigner", in order to use clef service in the
framework.
* peer: return localAddr instead of name to prevent spam
We currently use the name (which can be freely set by the peer) in several log messages.
This enables malicious actors to write spam into your geth log.
This commit returns the localAddr instead of the freely settable name.
* p2p: reduce usage of peer.Name in warn messages
* eth, p2p: use truncated names
* Update peer.go
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
For some reason, using the shared hash causes a cryptographic incompatibility
when using Go 1.15. I noticed this during the development of Discovery v5.1
when I added test vector verification.
The go library commit that broke this is golang/go@97240d5, but the
way we used HKDF is slightly dodgy anyway and it's not a regression.
This change moves the RLPx protocol implementation into a separate package,
p2p/rlpx. The new package can be used to establish RLPx connections for
protocol testing purposes.
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR adds an extra guarantee to NodeStateMachine: it ensures that all
immediate effects of a certain change are processed before any subsequent
effects of any of the immediate effects on the same node. In the original
version, if a cascaded change caused a subscription callback to be called
multiple times for the same node then these calls might have happened in a
wrong chronological order.
For example:
- a subscription to flag0 changes flag1 and flag2
- a subscription to flag1 changes flag3
- a subscription to flag1, flag2 and flag3 was called in the following order:
[flag1] -> [flag1, flag3]
[] -> [flag1]
[flag1, flag3] -> [flag1, flag2, flag3]
This happened because the tree of changes was traversed in a "depth-first
order". Now it is traversed in a "breadth-first order"; each node has a
FIFO queue for pending callbacks and each triggered subscription callback
is added to the end of the list. The already existing guarantees are
retained; no SetState or SetField returns until the callback queue of the
node is empty again. Just like before, it is the responsibility of the
state machine design to ensure that infinite state loops are not possible.
Multiple changes affecting the same node can still happen simultaneously;
in this case the changes can be interleaved in the FIFO of the node but the
correct order is still guaranteed.
A new unit test is also added to verify callback order in the above scenario.
This change improves discovery behavior in small networks. Very small
networks would often fail to bootstrap because all member nodes were
dropping table content due to findnode failure. The check is now changed
to avoid dropping nodes on findnode failure when their bucket is almost
empty. It also relaxes the liveness check requirement for FINDNODE/v4
response nodes, returning unverified nodes as results when there aren't
any verified nodes yet.
The "findnode failed" log now reports whether the node was dropped
instead of the number of results. The value of the "results" was
always zero by definition.
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds a lock around requests because some routers can't handle
concurrent requests. Requests are also rate-limited.
The Map function request a new mapping exactly when the map timeout
occurs instead of 5 minutes earlier. This should prevent duplicate mappings.
This PR significantly changes the APIs for instantiating Ethereum nodes in
a Go program. The new APIs are not backwards-compatible, but we feel that
this is made up for by the much simpler way of registering services on
node.Node. You can find more information and rationale in the design
document: https://gist.github.com/renaynay/5bec2de19fde66f4d04c535fd24f0775.
There is also a new feature in Node's Go API: it is now possible to
register arbitrary handlers on the user-facing HTTP server. In geth, this
facility is used to enable GraphQL.
There is a single minor change relevant for geth users in this PR: The
GraphQL API is no longer available separately from the JSON-RPC HTTP
server. If you want GraphQL, you need to enable it using the
./geth --http --graphql flag combination.
The --graphql.port and --graphql.addr flags are no longer available.
This PR reimplements the light client server pool. It is also a first step
to move certain logic into a new lespay package. This package will contain
the implementation of the lespay token sale functions, the token buying and
selling logic and other components related to peer selection/prioritization
and service quality evaluation. Over the long term this package will be
reusable for incentivizing future protocols.
Since the LES peer logic is now based on enode.Iterator, it can now use
DNS-based fallback discovery to find servers.
This document describes the function of the new components:
https://gist.github.com/zsfelfoldi/3c7ace895234b7b345ab4f71dab102d4
* p2p: add low port check in dialer
We already have a check like this for UDP ports, add a similar one in
the dialer. This prevents dials to port zero and it's also an extra
layer of protection against spamming HTTP servers.
* p2p/discover: use errLowPort in v4 code
* p2p: change port check
* p2p: add comment
* p2p/simulations/adapters: ensure assigned port is in all node records
It is possible to specify enode URLs using domain name since
commit b90cdbaa79cf, but the code comment still said that only
IP addresses are allowed.
Co-authored-by: admin@komgo.io <KomgoRocks2018!>
This adds two new methods to UDPv5, AllNodes and LocalNode.
AllNodes returns all the nodes stored in the local table; this is
useful for the purposes of metrics collection and also debugging any
potential issues with other discovery v5 implementations.
LocalNode returns the local node object. The reason for exposing this
is so that users can modify and set/delete new key-value entries in
the local record.
This PR adds service value measurement statistics to the light client. It
also adds a private API that makes these statistics accessible. A follow-up
PR will add the new server pool which uses these statistics to select
servers with good performance.
This document describes the function of the new components:
https://gist.github.com/zsfelfoldi/3c7ace895234b7b345ab4f71dab102d4
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.
This adds additional logic to re-resolve the root name of a tree when a
couple of leaf requests have failed. We need this change to avoid
getting into a failure state where leaf requests keep failing for half
an hour when the tree has been updated.
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
This is a temporary fix for a problem which started happening when the
dialer was changed to read nodes from an enode.Iterator. Before the
iterator change, discovery queries would always return within a couple
seconds even if there was no Internet access. Since the iterator won't
return unless a node is actually found, discoverTask can take much
longer. This means that the 'emergency connect' logic might not execute
in time, leading to a stuck node.
* p2p/dnsdisc: add support for enode.Iterator
This changes the dnsdisc.Client API to support the enode.Iterator
interface.
* p2p/dnsdisc: rate-limit DNS requests
* p2p/dnsdisc: preserve linked trees across root updates
This improves the way links are handled when the link root changes.
Previously, sync would simply remove all links from the current tree and
garbage-collect all unreachable trees before syncing the new list of
links.
This behavior isn't great in certain cases: Consider a structure where
trees A, B, and C reference each other and D links to A. If D's link
root changed, the sync code would first remove trees A, B and C, only to
re-sync them later when the link to A was found again.
The fix for this problem is to track the current set of links in each
clientTree and removing old links only AFTER all links are synced.
* p2p/dnsdisc: deflake iterator test
* cmd/devp2p: adapt dnsClient to new p2p/dnsdisc API
* p2p/dnsdisc: tiny comment fix
* build: use golangci-lint
This changes build/ci.go to download and run golangci-lint instead
of gometalinter.
* core/state: fix unnecessary conversion
* p2p/simulations: fix lock copying (found by go vet)
* signer/core: fix unnecessary conversions
* crypto/ecies: remove unused function cmpPublic
* core/rawdb: remove unused function print
* core/state: remove unused function xTestFuzzCutter
* core/vm: disable TestWriteExpectedValues in a different way
* core/forkid: remove unused function checksum
* les: remove unused type proofsData
* cmd/utils: remove unused functions prefixedNames, prefixFor
* crypto/bn256: run goimports
* p2p/nat: fix goimports lint issue
* cmd/clef: avoid using unkeyed struct fields
* les: cancel context in testRequest
* rlp: delete unreachable code
* core: gofmt
* internal/build: simplify DownloadFile for Go 1.11 compatibility
* build: remove go test --short flag
* .travis.yml: disable build cache
* whisper/whisperv6: fix ineffectual assignment in TestWhisperIdentityManagement
* .golangci.yml: enable goconst and ineffassign linters
* build: print message when there are no lint issues
* internal/build: refactor download a bit
* rpc: improve codec abstraction
rpc.ServerCodec is an opaque interface. There was only one way to get a
codec using existing APIs: rpc.NewJSONCodec. This change exports
newCodec (as NewFuncCodec) and NewJSONCodec (as NewCodec). It also makes
all codec methods non-public to avoid showing internals in godoc.
While here, remove codec options in tests because they are not
supported anymore.
* p2p/simulations: use github.com/gorilla/websocket
This package was the last remaining user of golang.org/x/net/websocket.
Migrating to the new library wasn't straightforward because it is no
longer possible to treat WebSocket connections as a net.Conn.
* vendor: delete golang.org/x/net/websocket
* rpc: fix godoc comments and run gofmt
This adds all dashboard changes from the last couple months.
We're about to remove the dashboard, but decided that we should
get all the recent work in first in case anyone wants to pick up this
project later on.
* cmd, dashboard, eth, p2p: send peer info to the dashboard
* dashboard: update npm packages, improve UI, rebase
* dashboard, p2p: remove println, change doc
* cmd, dashboard, eth, p2p: cleanup after review
* dashboard: send current block to the dashboard client
This adds an implementation of node discovery via DNS TXT records to the
go-ethereum library. The implementation doesn't match EIP-1459 exactly,
the main difference being that this implementation uses separate merkle
trees for tree links and ENRs. The EIP will be updated to match p2p/dnsdisc.
To maintain DNS trees, cmd/devp2p provides a frontend for the p2p/dnsdisc
library. The new 'dns' subcommands can be used to create, sign and deploy DNS
discovery trees.
Most of these changes are related to the Go 1.13 changes to test binary
flag handling.
* cmd/geth: make attach tests more reliable
This makes the test wait for the endpoint to come up by polling
it instead of waiting for two seconds.
* tests: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* crypto/ecies: remove useless -dump flag in tests
* p2p/simulations: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* build: remove workaround for ./... vendor matching
This workaround was necessary for Go 1.8. The Go 1.9 release changed
the expansion rules to exclude vendored packages.
* Makefile: use relative path for GOBIN
This makes the "Run ./build/bin/..." line look nicer.
* les: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* eth: chain config (genesis + fork) ENR entry
* core/forkid, eth: protocol independent fork ID, update to CRC32 spec
* core/forkid, eth: make forkid a struct, next uint64, enr struct, RLP
* core/forkid: change forkhash rlp encoding from int to [4]byte
* eth: fixup eth entry a bit and update it every block
* eth: fix lint
* eth: fix crash in ethclient tests
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.
Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
* p2p/enr: add entries for for IPv4/IPv6 separation
This adds entry types for "ip6", "udp6", "tcp6" keys. The IP type stays
around because removing it would break a lot of code and force everyone
to care about the distinction.
* p2p/enode: track IPv4 and IPv6 address separately
LocalNode predicts the local node's UDP endpoint and updates the record.
This change makes it predict IPv4 and IPv6 endpoints separately since
they can now be in the record at the same time.
* p2p/enode: implement base64 text format
* all: switch to enode.Parse(...)
This allows passing base64-encoded node records to all the places that
previously accepted enode:// URLs. The URL format is still supported.
* cmd/bootnode, p2p: log node URL instead of ENR
...and return the base64 record in NodeInfo.
* p2p/discover: export Ping and RequestENR
These two are useful for checking the status of a node.
* cmd/devp2p: add devp2p debug tool
This is a new tool for debugging p2p issues. It supports a few
basic tasks for now, but many more things can and will be added
in the near future.
devp2p enrdump -- prints ENRs readably
devp2p discv4 ping -- checks if a node is up
devp2p discv4 requestenr -- gets a node's record
devp2p discv4 resolve -- finds a node through the DHT
This change implements EIP-868. The UDPv4 transport announces support
for the extension in ping/pong and handles enrRequest messages.
There are two uses of the extension: If a remote node announces support
for EIP-868 in their pong, node revalidation pulls the node's record.
The Resolve method requests the record unconditionally.
cmd/swarm/swarm-smoke: improve smoke tests (#1337)
swarm/network: remove dead code (#1339)
swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
This change restructures the internals of p2p/discover to make room for
the discv5 code which will soon be added to this package.
- packet type names now have a "V4" suffix.
- ListenUDP returns *UDPv4 instead of *Table. This technically breaks
the API but the only caller in go-ethereum is package p2p, which uses
a compatible interface and doesn't need changes.
- The internal transport interface is changed to make Table reusable for v5.
- The 'lookup' code moves from table to transport. This required
updating the lookup unit test to use udpTest instead of a custom transport.
* swarm/network: fix data races in TestInitialPeersMsg test
* swarm/network: add Kademlia.Saturation method with lock
* swarm/network: add Hive.Peer method to safely retrieve a bzz peer
* swarm/network: remove duplicate comment
* p2p/testing: prevent goroutine leak in ProtocolTester
* swarm/network: fix data race in newBzzBaseTesterWithAddrs
* swarm/network: fix goroutone leaks in testInitialPeersMsg
* swarm/network: raise number of peer check attempts in testInitialPeersMsg
* swarm/network: use Hive.Peer in Hive.PeerInfo function
* swarm/network: reduce the scope of mutex lock in newBzzBaseTesterWithAddrs
* swarm/storage: disable TestCleanIndex with race detector
This resolves a minor issue where neighbors responses containing less
than 16 nodes would bump the failure counter, removing the node. One
situation where this can happen is a private deployment where the total
number of extant nodes is less than 16.
Issue found by @jsying.
dummyHook's fields were concurrently written by nodes and read by
the test. The simplest solution is to protect all fields with a mutex.
Enable: TestMultiplePeersDropSelf, TestMultiplePeersDropOther as they
seemingly accidentally stayed disabled during a refactor/rewrite
since 1836366ac19e30f157570e61342fae53bc6c8a57.
resolvesethersphere/go-ethereum#1286
* p2p/discover: remove unused function
* p2p/enode: use localItemKey for local sequence number
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
This change
- implements concurrent LES request serving even for a single peer.
- replaces the request cost estimation method with a cost table based on
benchmarks which gives much more consistent results. Until now the
allowed number of light peers was just a guess which probably contributed
a lot to the fluctuating quality of available service. Everything related
to request cost is implemented in a single object, the 'cost tracker'. It
uses a fixed cost table with a global 'correction factor'. Benchmark code
is included and can be run at any time to adapt costs to low-level
implementation changes.
- reimplements flowcontrol.ClientManager in a cleaner and more efficient
way, with added capabilities: There is now control over bandwidth, which
allows using the flow control parameters for client prioritization.
Target utilization over 100 percent is now supported to model concurrent
request processing. Total serving bandwidth is reduced during block
processing to prevent database contention.
- implements an RPC API for the LES servers allowing server operators to
assign priority bandwidth to certain clients and change prioritized
status even while the client is connected. The new API is meant for
cases where server operators charge for LES using an off-protocol mechanism.
- adds a unit test for the new client manager.
- adds an end-to-end test using the network simulator that tests bandwidth
control functions through the new API.
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3c5863f8612d25c246bcfad339f36b7,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
* node: close AccountsManager in new Close method
* p2p/simulations, p2p/simulations/adapters: handle node close on shutdown
* node: move node ephemeralKeystore cleanup to stop method
* node: call Stop in Node.Close method
* cmd/geth: close node.Node created with makeFullNode in cli commands
* node: close Node instances in tests
* cmd/geth, node: minor code style fixes
* cmd, console, miner, mobile: proper node Close() termination
This change clears up confusion around the two ways in which nodes
can be added to the table.
When a neighbors packet is received as a reply to findnode, the nodes
contained in the reply are added as 'seen' entries if sufficient space
is available.
When a ping is received and the endpoint verification has taken place,
the remote node is added as a 'verified' entry or moved to the front of
the bucket if present. This also updates the node's IP address and port
if they have changed.
This change resolves multiple issues around handling of endpoint proofs.
The proof is now done separately for each IP and completing the proof
requires a matching ping hash.
Also remove waitping because it's equivalent to sleep. waitping was
slightly more efficient, but that may cause issues with findnode if
packets are reordered and the remote end sees findnode before pong.
Logging of received packets was hitherto done after handling the packet,
which meant that sent replies were logged before the packet that
generated them. This change splits up packet handling into 'preverify'
and 'handle'. The error from 'preverify' is logged, but 'handle' happens
after the message is logged. This fixes the order. Packet logs now
contain the node ID.
* p2p/simulation: WIP minimal snapshot test
* p2p/simulation: Add snapshot create, load and verify to snapshot test
* build: add test tag for tests
* p2p/simulations, build: Revert travis change, build test sym always
* p2p/simulations: Add comments, timeout check on additional events
* p2p/simulation: Add benchmark template for minimal peer protocol init
* p2p/simulations: Remove unused code
* p2p/simulation: Correct timer reset
* p2p/simulations: Put snapshot check events in buffer and call blocking
* p2p/simulations: TestSnapshot fail if Load function returns early
* p2p/simulations: TestSnapshot wait for all connections before returning
* p2p/simulation: Revert to before wait for snap load (5e75594)
* p2p/simulations: add "conns after load" subtest to TestSnapshot
and nudge
* swarm/network: Hive - do not notify peer if discovery is disabled
* p2p/simulations: validate all connections on loading a snapshot
* p2p/simulations: track all connections in on snapshot loading
* p2p/simulations: add snapshotLoadTimeout variable
* p2p/simulations: ignore control events in snapshot load
* p2p/simulations: simplify event loop synchronization
* p2p/simulations: return already connected error from Load function
* p2p/simulations: log warning on snapshot loading disconnection
This change extends the peer metrics collection:
- traces the life-cycle of the peers
- meters the peer traffic separately for every peer
- creates event feed for the peer events
- emits the peer events
This PR adds enode.LocalNode and integrates it into the p2p
subsystem. This new object is the keeper of the local node
record. For now, a new version of the record is produced every
time the client restarts. We'll make it smarter to avoid that in
the future.
There are a couple of other changes in this commit: discovery now
waits for all of its goroutines at shutdown and the p2p server
now closes the node database after discovery has shut down. This
fixes a leveldb crash in tests. p2p server startup is faster
because it doesn't need to wait for the external IP query
anymore.
This fixes a rare deadlock with the inproc adapter:
- A node is stopped, which acquires Network.lock.
- The protocol code being simulated (swarm/network in my case)
waits for its goroutines to shut down.
- One of those goroutines calls into the simulation to add a peer,
which waits for Network.lock.
The fix for the deadlock is really simple, just release the lock
before stopping the simulation node.
Other changes in this PR clean up the exec adapter so it reports
node startup errors better and remove the docker adapter because
it just adds overhead.
In the exec adapter, node information is now posted to a one-shot
server. This avoids log parsing and allows reporting startup
errors to the simulation host.
A small change in package node was needed because simulation
nodes use port zero. Node.{HTTP,WS}Endpoint now return the live
endpoints after startup by checking the TCP listener.
Package p2p/enode provides a generalized representation of p2p nodes
which can contain arbitrary information in key/value pairs. It is also
the new home for the node database. The "v4" identity scheme is also
moved here from p2p/enr to remove the dependency on Ethereum crypto from
that package.
Record signature handling is changed significantly. The identity scheme
registry is removed and acceptable schemes must be passed to any method
that needs identity. This means records must now be validated explicitly
after decoding.
The enode API is designed to make signature handling easy and safe: most
APIs around the codebase work with enode.Node, which is a wrapper around
a valid record. Going from enr.Record to enode.Node requires a valid
signature.
* p2p/discover: port to p2p/enode
This ports the discovery code to the new node representation in
p2p/enode. The wire protocol is unchanged, this can be considered a
refactoring change. The Kademlia table can now deal with nodes using an
arbitrary identity scheme. This requires a few incompatible API changes:
- Table.Lookup is not available anymore. It used to take a public key
as argument because v4 protocol requires one. Its replacement is
LookupRandom.
- Table.Resolve takes *enode.Node instead of NodeID. This is also for
v4 protocol compatibility because nodes cannot be looked up by ID
alone.
- Types Node and NodeID are gone. Further commits in the series will be
fixes all over the the codebase to deal with those removals.
* p2p: port to p2p/enode and discovery changes
This adapts package p2p to the changes in p2p/discover. All uses of
discover.Node and discover.NodeID are replaced by their equivalents from
p2p/enode.
New API is added to retrieve the enode.Node instance of a peer. The
behavior of Server.Self with discovery disabled is improved. It now
tries much harder to report a working IP address, falling back to
127.0.0.1 if no suitable address can be determined through other means.
These changes were needed for tests of other packages later in the
series.
* p2p/simulations, p2p/testing: port to p2p/enode
No surprises here, mostly replacements of discover.Node, discover.NodeID
with their new equivalents. The 'interesting' API changes are:
- testing.ProtocolSession tracks complete nodes, not just their IDs.
- adapters.NodeConfig has a new method to create a complete node.
These changes were needed to make swarm tests work.
Note that the NodeID change makes the code incompatible with old
simulation snapshots.
* whisper/whisperv5, whisper/whisperv6: port to p2p/enode
This port was easy because whisper uses []byte for node IDs and
URL strings in the API.
* eth: port to p2p/enode
Again, easy to port because eth uses strings for node IDs and doesn't
care about node information in any way.
* les: port to p2p/enode
Apart from replacing discover.NodeID with enode.ID, most changes are in
the server pool code. It now deals with complete nodes instead
of (Pubkey, IP, Port) triples. The database format is unchanged for now,
but we should probably change it to use the node database later.
* node: port to p2p/enode
This change simply replaces discover.Node and discover.NodeID with their
new equivalents.
* swarm/network: port to p2p/enode
Swarm has its own node address representation, BzzAddr, containing both
an overlay address (the hash of a secp256k1 public key) and an underlay
address (enode:// URL).
There are no changes to the BzzAddr format in this commit, but certain
operations such as creating a BzzAddr from a node ID are now impossible
because node IDs aren't public keys anymore.
Most swarm-related changes in the series remove uses of
NewAddrFromNodeID, replacing it with NewAddr which takes a complete node
as argument. ToOverlayAddr is removed because we can just use the node
ID directly.
* cmd/swarm: minor cli flag text adjustments
* cmd/swarm, swarm/storage, swarm: fix mingw on windows test issues
* cmd/swarm: support for smoke tests on the production swarm cluster
* cmd/swarm/swarm-smoke: simplify cluster logic as per suggestion
* changed colour of landing page
* landing page reacts to enter keypress
* swarm/api/http: sticky footer for swarm landing page using flex
* swarm/api/http: sticky footer for error pages and fix for multiple choices
* swarm: propagate ctx to internal apis (#754)
* swarm/simnet: add basic node/service functions
* swarm/netsim: add buckets for global state and kademlia health check
* swarm/netsim: Use sync.Map as bucket and provide cleanup function for...
* swarm, swarm/netsim: adjust SwarmNetworkTest
* swarm/netsim: fix tests
* swarm: added visualization option to sim net redesign
* swarm/netsim: support multiple services per node
* swarm/netsim: remove redundant return statement
* swarm/netsim: add comments
* swarm: shutdown HTTP in Simulation.Close
* swarm: sim HTTP server timeout
* swarm/netsim: add more simulation methods and peer events examples
* swarm/netsim: add WaitKademlia example
* swarm/netsim: fix comments
* swarm/netsim: terminate peer events goroutines on simulation done
* swarm, swarm/netsim: naming updates
* swarm/netsim: return not healthy kademlias on WaitTillHealthy
* swarm: fix WaitTillHealthy call in testSwarmNetwork
* swarm/netsim: allow bucket to have any type for a key
* swarm: Added snapshots to new netsim
* swarm/netsim: add more tests for bucket
* swarm/netsim: move http related things into separate files
* swarm/netsim: add AddNodeWithService option
* swarm/netsim: add more tests and Start* methods
* swarm/netsim: add peer events and kademlia tests
* swarm/netsim: fix some tests flakiness
* swarm/netsim: improve random nodes selection, fix TestStartStop* tests
* swarm/netsim: remove time measurement from TestClose to avoid flakiness
* swarm/netsim: builder pattern for netsim HTTP server (#773)
* swarm/netsim: add connect related tests
* swarm/netsim: add comment for TestPeerEvents
* swarm: rename netsim package to network/simulation
* p2p/discover: move bond logic from table to transport
This commit moves node endpoint verification (bonding) from the table to
the UDP transport implementation. Previously, adding a node to the table
entailed pinging the node if needed. With this change, the ping-back
logic is embedded in the packet handler at a lower level.
It is easy to verify that the basic protocol is unchanged: we still
require a valid pong reply from the node before findnode is accepted.
The node database tracked the time of last ping sent to the node and
time of last valid pong received from the node. Node endpoints are
considered verified when a valid pong is received and the time of last
pong was called 'bond time'. The time of last ping sent was unused. In
this commit, the last ping database entry is repurposed to mean last
ping _received_. This entry is now used to track whether the node needs
to be pinged back.
The other big change is how nodes are added to the table. We used to add
nodes in Table.bond, which ran when a remote node pinged us or when we
encountered the node in a neighbors reply. The transport now adds to the
table directly after the endpoint is verified through ping. To ensure
that the Table can't be filled just by pinging the node repeatedly, we
retain the isInitDone check. During init, only nodes from neighbors
replies are added.
* p2p/discover: reduce findnode failure counter on success
* p2p/discover: remove unused parameter of loadSeedNodes
* p2p/discover: improve ping-back check and comments
* p2p/discover: add neighbors reply nodes always, not just during init
These RPC calls are analogous to Parity's parity_addReservedPeer and
parity_removeReservedPeer.
They are useful for adjusting the trusted peer set during runtime,
without requiring restarting the server.
This commit adds all changes needed for the merge of swarm-network-rewrite.
The changes:
- build: increase linter timeout
- contracts/ens: export ensNode
- log: add Output method and enable fractional seconds in format
- metrics: relax test timeout
- p2p: reduced some log levels, updates to simulation packages
- rpc: increased maxClientSubscriptionBuffer to 20000
ToECDSAPub was unsafe because it returned a non-nil key with nil X, Y in
case of invalid input. This change replaces ToECDSAPub with
UnmarshalPubkey across the codebase.
This applies spec changes from ethereum/EIPs#1049 and adds support for
pluggable identity schemes.
Some care has been taken to make the "v4" scheme standalone. It uses
public APIs only and could be moved out of package enr at any time.
A couple of minor changes were needed to make identity schemes work:
- The sequence number is now updated in Set instead of when signing.
- Record is now copy-safe, i.e. calling Set on a shallow copy doesn't
modify the record it was copied from.
This change removes a peer information from dialing history
when peer is removed from static list. It allows to force a
server to re-dial concrete peer if it is needed.
In our case we are running geth node on mobile devices, and
it is common for a network connection to flap on mobile.
Almost every time it flaps or network connection is changed
from cellular to wifi peers are disconnected with read
timeout. And usually it takes 30 seconds (default expiration
timeout) to recover connection with static peers after
connectivity is restored.
This change allows us to reconnect with peers almost
immediately and it seems harmless enough.
I forgot to change the check in udp.go when I changed Table.bond to be
based on lastPong instead of node presence in db. Rename lastPong to
bondTime and add hasBond so it's clearer what this DB key is used for
now.