From dd9233d89b62759e84491609cc41328a81da1bea Mon Sep 17 00:00:00 2001 From: Bryan Stitt Date: Sat, 25 Feb 2023 15:07:05 -0800 Subject: [PATCH] lots of todos --- TODO.md | 57 +++++++++++++++++++++++++++++++++++------ rate-counter/src/lib.rs | 2 ++ 2 files changed, 51 insertions(+), 8 deletions(-) diff --git a/TODO.md b/TODO.md index 8db1a2e1..46aa710d 100644 --- a/TODO.md +++ b/TODO.md @@ -335,9 +335,29 @@ These are not yet ordered. There might be duplicates. We might not actually need - [x] don't cache methods that are usually very large - [x] use http provider when available - [x] per-chain rpc rate limits -- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit +- [x] canonical block checks giving weird errors. change healthcheck to use block number + [2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::request] error response from blastapi! method=eth_getCode params=(0xa9a8760b8333efae8c9c751e6695a11938ae4b90, 0x73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12) err=JsonRpcClientError(JsonRpcError(JsonRpcError { code: -32603, message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical", data: None })) + [2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::one] blastapi failed health check query! Error { + context: "ProviderError from the backend", + source: JsonRpcClientError( + JsonRpcError( + JsonRpcError { + code: -32603, + message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical", + data: None, + }, + ), + ), + } - [x] add a "failover" tier that is only used if balanced_rpcs has "no servers synced" - use this tier (and private tier) to check timestamp on latest block. if we are behind that by more than a few seconds, something is wrong +- [x] cli flag to set prometheus port +- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block +- [x] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests + - erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers + - i think this also means we don't need to worry about changing the id that the user gives us. +- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block +- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit - [-] proxy mode for benchmarking all backends - [-] proxy mode for sending to multiple backends - [-] let users choose a % of reverts to log (or maybe x/second). someone like curve logging all reverts will be a BIG database very quickly @@ -352,10 +372,11 @@ These are not yet ordered. There might be duplicates. We might not actually need - create the app without applying any config to it - have a blocking future watching the config file and calling app.apply_config() on first load and on change - work started on this in the "config_reloads" branch. because of how we pass channels around during spawn, this requires a larger refactor. -- [ ] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests - - erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers - - i think this also means we don't need to worry about changing the id that the user gives us. - - have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code) +- change if premium concurrency limit to be against ip+rpckey + - then sites like curve.fi don't have to worry about their user count + - it does mean we will have a harder time capacity planning from the number of keys +- [ ] eth_getLogs is going to unsynced nodes when synced nodes are available. always prefer synced nodes +- [ ] have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code) - [ ] don't use new_head_provider anywhere except new head subscription - [ ] maybe we shouldn't route eth_getLogs to syncing nodes. serving queries slows down sync significantly - change the send_best function to only include servers that are at least close to fully synced @@ -396,7 +417,6 @@ These are not yet ordered. There might be duplicates. We might not actually need - maybe sum available_requests grouped by archive/non-archive. only limit to non-archive if they have enough? - [ ] some places we call it "accounting" others a "stat". be consistent - [ ] cli commands to search users by key -- [x] cli flag to set prometheus port - [ ] flamegraphs show 25% of the time to be in moka-housekeeper. tune that - [ ] config parsing is strict right now. this makes it hard to deploy on git push since configs need to change along with it - [ ] when displaying the user's data, they just see an opaque id for their tier. We should join that data @@ -488,9 +508,30 @@ These are not yet ordered. There might be duplicates. We might not actually need - [ ] relevant erigon changelogs: add pendingTransactionWithBody subscription method (#5675) - [ ] change_user_tier_by_key should not show the rpc key id. that way our ansible playbook won't expose it - [ ] make sure all our responses follow the spec: https://www.jsonrpc.org/specification#examples -- [ ] min_sum_soft_limit should be automatic based on the apps average rps plus a buffer. - - if min_sum_soft_limit > max_sum_soft_limit, just wait for all? emit a warning +- [ ] min_sum_soft_limit should be automatic based on the app's average rps plus a buffer. + - [ ] add a rate counter to the balanced_rpcs + - [ ] every time a block is found, update min_sum_soft_limit + - [ ] add a min_sum_soft_limit_safety + - keeps the automaticly calculated limit from going so high that we stop serving requests + - [ ] add a min_sum_soft_limit_max_wait that advances the consensus block even if mins not met yet - [ ] a script for load testing a server and calculating its hard and soft limits +- [ ] use https://github.com/dherman/esprit or similar to parse https://github.com/DefiLlama/chainlist/blob/main/constants/extraRpcs.js +- [ ] update example.toml + - might need to make changes so the influxdb stuff is optional. david said it stopped right after starting +- [ ] i'm seeing a bunch of errors with eth_getLogs. + - i think maybe my block number rewriting is causing problems. but maybe its just a user doing bad queries +- [ ] Use "is_fresh" instead of our atomic bool + - moka 0.10 - Add entry and entry_by_ref APIs to sync and future caches (#193): + They allow users to perform more complex operations on a cache entry. At this point, the following operations (methods) are provided: + or_default + or_insert + or_insert_with + or_insert_with_if + or_optionally_insert_with + or_try_insert_with + The above methods return Entry type, which provides is_fresh method to check if the value was freshly computed or already existed in the cache. +- [ ] lag message always shows on first response + - http interval on blastapi lagging by 1! ## V2 diff --git a/rate-counter/src/lib.rs b/rate-counter/src/lib.rs index 11a8359e..23f41099 100644 --- a/rate-counter/src/lib.rs +++ b/rate-counter/src/lib.rs @@ -1,4 +1,6 @@ //! A counter of events in a time period. +//! +//! TODO: maybe better to do something like this though: https://github.com/facebookarchive/metrics/blob/master/ewma.go use std::collections::VecDeque; use tokio::time::{Duration, Instant};