lots of todos

This commit is contained in:
Bryan Stitt 2023-02-25 15:07:05 -08:00
parent 28ac542bc9
commit dd9233d89b
2 changed files with 51 additions and 8 deletions

57
TODO.md

@ -335,9 +335,29 @@ These are not yet ordered. There might be duplicates. We might not actually need
- [x] don't cache methods that are usually very large
- [x] use http provider when available
- [x] per-chain rpc rate limits
- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit
- [x] canonical block checks giving weird errors. change healthcheck to use block number
[2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::request] error response from blastapi! method=eth_getCode params=(0xa9a8760b8333efae8c9c751e6695a11938ae4b90, 0x73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12) err=JsonRpcClientError(JsonRpcError(JsonRpcError { code: -32603, message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical", data: None }))
[2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::one] blastapi failed health check query! Error {
context: "ProviderError from the backend",
source: JsonRpcClientError(
JsonRpcError(
JsonRpcError {
code: -32603,
message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical",
data: None,
},
),
),
}
- [x] add a "failover" tier that is only used if balanced_rpcs has "no servers synced"
- use this tier (and private tier) to check timestamp on latest block. if we are behind that by more than a few seconds, something is wrong
- [x] cli flag to set prometheus port
- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block
- [x] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests
- erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers
- i think this also means we don't need to worry about changing the id that the user gives us.
- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block
- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit
- [-] proxy mode for benchmarking all backends
- [-] proxy mode for sending to multiple backends
- [-] let users choose a % of reverts to log (or maybe x/second). someone like curve logging all reverts will be a BIG database very quickly
@ -352,10 +372,11 @@ These are not yet ordered. There might be duplicates. We might not actually need
- create the app without applying any config to it
- have a blocking future watching the config file and calling app.apply_config() on first load and on change
- work started on this in the "config_reloads" branch. because of how we pass channels around during spawn, this requires a larger refactor.
- [ ] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests
- erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers
- i think this also means we don't need to worry about changing the id that the user gives us.
- have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code)
- change if premium concurrency limit to be against ip+rpckey
- then sites like curve.fi don't have to worry about their user count
- it does mean we will have a harder time capacity planning from the number of keys
- [ ] eth_getLogs is going to unsynced nodes when synced nodes are available. always prefer synced nodes
- [ ] have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code)
- [ ] don't use new_head_provider anywhere except new head subscription
- [ ] maybe we shouldn't route eth_getLogs to syncing nodes. serving queries slows down sync significantly
- change the send_best function to only include servers that are at least close to fully synced
@ -396,7 +417,6 @@ These are not yet ordered. There might be duplicates. We might not actually need
- maybe sum available_requests grouped by archive/non-archive. only limit to non-archive if they have enough?
- [ ] some places we call it "accounting" others a "stat". be consistent
- [ ] cli commands to search users by key
- [x] cli flag to set prometheus port
- [ ] flamegraphs show 25% of the time to be in moka-housekeeper. tune that
- [ ] config parsing is strict right now. this makes it hard to deploy on git push since configs need to change along with it
- [ ] when displaying the user's data, they just see an opaque id for their tier. We should join that data
@ -488,9 +508,30 @@ These are not yet ordered. There might be duplicates. We might not actually need
- [ ] relevant erigon changelogs: add pendingTransactionWithBody subscription method (#5675)
- [ ] change_user_tier_by_key should not show the rpc key id. that way our ansible playbook won't expose it
- [ ] make sure all our responses follow the spec: https://www.jsonrpc.org/specification#examples
- [ ] min_sum_soft_limit should be automatic based on the apps average rps plus a buffer.
- if min_sum_soft_limit > max_sum_soft_limit, just wait for all? emit a warning
- [ ] min_sum_soft_limit should be automatic based on the app's average rps plus a buffer.
- [ ] add a rate counter to the balanced_rpcs
- [ ] every time a block is found, update min_sum_soft_limit
- [ ] add a min_sum_soft_limit_safety
- keeps the automaticly calculated limit from going so high that we stop serving requests
- [ ] add a min_sum_soft_limit_max_wait that advances the consensus block even if mins not met yet
- [ ] a script for load testing a server and calculating its hard and soft limits
- [ ] use https://github.com/dherman/esprit or similar to parse https://github.com/DefiLlama/chainlist/blob/main/constants/extraRpcs.js
- [ ] update example.toml
- might need to make changes so the influxdb stuff is optional. david said it stopped right after starting
- [ ] i'm seeing a bunch of errors with eth_getLogs.
- i think maybe my block number rewriting is causing problems. but maybe its just a user doing bad queries
- [ ] Use "is_fresh" instead of our atomic bool
- moka 0.10 - Add entry and entry_by_ref APIs to sync and future caches (#193):
They allow users to perform more complex operations on a cache entry. At this point, the following operations (methods) are provided:
or_default
or_insert
or_insert_with
or_insert_with_if
or_optionally_insert_with
or_try_insert_with
The above methods return Entry type, which provides is_fresh method to check if the value was freshly computed or already existed in the cache.
- [ ] lag message always shows on first response
- http interval on blastapi lagging by 1!
## V2

@ -1,4 +1,6 @@
//! A counter of events in a time period.
//!
//! TODO: maybe better to do something like this though: https://github.com/facebookarchive/metrics/blob/master/ewma.go
use std::collections::VecDeque;
use tokio::time::{Duration, Instant};