* focus on performance improvement in many aspects.
1. Do BlockBody verification concurrently;
2. Do calculation of intermediate root concurrently;
3. Preload accounts before processing blocks;
4. Make the snapshot layers configurable.
5. Reuse some object to reduce GC.
add
* rlp: improve decoder stream implementation (#22858)
This commit makes various cleanup changes to rlp.Stream.
* rlp: shrink Stream struct
This removes a lot of unused padding space in Stream by reordering the
fields. The size of Stream changes from 120 bytes to 88 bytes. Stream
instances are internally cached and reused using sync.Pool, so this does
not improve performance.
* rlp: simplify list stack
The list stack kept track of the size of the current list context as
well as the current offset into it. The size had to be stored in the
stack in order to subtract it from the remaining bytes of any enclosing
list in ListEnd. It seems that this can be implemented in a simpler
way: just subtract the size from the enclosing list context in List instead.
* rlp: use atomic.Value for type cache (#22902)
All encoding/decoding operations read the type cache to find the
writer/decoder function responsible for a type. When analyzing CPU
profiles of geth during sync, I found that the use of sync.RWMutex in
cache lookups appears in the profiles. It seems we are running into
CPU cache contention problems when package rlp is heavily used
on all CPU cores during sync.
This change makes it use atomic.Value + a writer lock instead of
sync.RWMutex. In the common case where the typeinfo entry is present in
the cache, we simply fetch the map and lookup the type.
* rlp: optimize byte array handling (#22924)
This change improves the performance of encoding/decoding [N]byte.
name old time/op new time/op delta
DecodeByteArrayStruct-8 336ns ± 0% 246ns ± 0% -26.98% (p=0.000 n=9+10)
EncodeByteArrayStruct-8 225ns ± 1% 148ns ± 1% -34.12% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
DecodeByteArrayStruct-8 120B ± 0% 48B ± 0% -60.00% (p=0.000 n=10+10)
EncodeByteArrayStruct-8 0.00B 0.00B ~ (all equal)
* rlp: optimize big.Int decoding for size <= 32 bytes (#22927)
This change grows the static integer buffer in Stream to 32 bytes,
making it possible to decode 256bit integers without allocating a
temporary buffer.
In the recent commit 088da24, Stream struct size decreased from 120
bytes down to 88 bytes. This commit grows the struct to 112 bytes again,
but the size change will not degrade performance because Stream
instances are internally cached in sync.Pool.
name old time/op new time/op delta
DecodeBigInts-8 12.2µs ± 0% 8.6µs ± 4% -29.58% (p=0.000 n=9+10)
name old speed new speed delta
DecodeBigInts-8 230MB/s ± 0% 326MB/s ± 4% +42.04% (p=0.000 n=9+10)
* eth/protocols/eth, les: avoid Raw() when decoding HashOrNumber (#22841)
Getting the raw value is not necessary to decode this type, and
decoding it directly from the stream is faster.
* fix testcase
* debug no lazy
* fix can not repair
* address comments
Co-authored-by: Felix Lange <fjl@twurst.com>
* core: added local tx pool test case
* core, crypto: various allocation savings regarding tx handling
* core/txlist, txpool: save a reheap operation, avoid some bigint allocs
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Reverting because this change started handling account balances as
uint64 in the transaction pool, which is incorrect.
This reverts commit af5c97aebe1d37486635521ef553cb8bd4bada13.
* core: use uint64 for total tx costs instead of big.Int
* core: added local tx pool test case
* core, crypto: various allocation savings regarding tx handling
* Update core/tx_list.go
* core: added tx.GasPriceIntCmp for comparison without allocation
adds a method to remove unneeded allocation in comparison to tx.gasPrice
* core: handle pools full of locals better
* core/tests: benchmark for tx_list
* core/txlist, txpool: save a reheap operation, avoid some bigint allocs
Co-authored-by: Martin Holst Swende <martin@swende.se>
* core, crypto: various allocation savings regarding tx handling
* core: reduce allocs for gas price comparison
This change reduces the allocations needed for comparing different transactions to each other.
A call to `tx.GasPrice()` copies the gas price as it has to be safe against modifications and
also needs to be threadsafe. For comparing and ordering different transactions we don't need
these guarantees
* core: added tx.GasPriceIntCmp for comparison without allocation
adds a method to remove unneeded allocation in comparison to tx.gasPrice
* core/types: pool legacykeccak256 objects in rlpHash
rlpHash is by far the most used function in core that allocates a legacyKeccak256 object on each call.
Since it is so widely used it makes sense to add pooling here so we relieve the GC.
On my machine these changes result in > 100 MILLION less allocations and > 30 GB less allocated memory.
* reverted some changes
* reverted some changes
* trie: use crypto.KeccakState instead of replicating code
Co-authored-by: Martin Holst Swende <martin@swende.se>
* core: use a wrapped `map` and `sync.RWMutex` for `TxPool.all` to remove contention in `TxPool.Get`.
* core: Remove redundant `txLookup.Find` and improve comments on txLookup methods.
* core: allow price bump at threshold
* core: test changes to allow price bump at threshold
* core: reinstate tx replacement test underneath threshold
* core: minor test failure message cleanups
This PR implements the new LES protocol version extensions:
* new and more efficient Merkle proofs reply format (when replying to
a multiple Merkle proofs request, we just send a single set of trie
nodes containing all necessary nodes)
* BBT (BloomBitsTrie) works similarly to the existing CHT and contains
the bloombits search data to speed up log searches
* GetTxStatusMsg returns the inclusion position or the
pending/queued/unknown state of a transaction referenced by hash
* an optional signature of new block data (number/hash/td) can be
included in AnnounceMsg to provide an option for "very light
clients" (mobile/embedded devices) to skip expensive Ethash check
and accept multiple signatures of somewhat trusted servers (still a
lot better than trusting a single server completely and retrieving
everything through RPC). The new client mode is not implemented in
this PR, just the protocol extension.
The commit reworks the transaction pool queue limitation tests
to cater for testing local accounts, also testing the nolocal flag.
In addition, it also fixes a panic if local transactions exceeded
the global queue allowance (no accounts left to drop from) and also
fixes queue eviction to operate on all accounts, not just the one
being updated.