lots of todos

2023-02-25 15:07:05 -08:00 · 2023-02-25 15:07:05 -08:00 · dd9233d89b
commit dd9233d89b
parent 28ac542bc9
2 changed files with 51 additions and 8 deletions
--- a/TODO.md
+++ b/TODO.md
@ -335,9 +335,29 @@ These are not yet ordered. There might be duplicates. We might not actually need
 - [x] don't cache methods that are usually very large
 - [x] use http provider when available
 - [x] per-chain rpc rate limits
- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit
+- [x] canonical block checks giving weird errors. change healthcheck to use block number
+    [2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::request] error response from blastapi! method=eth_getCode params=(0xa9a8760b8333efae8c9c751e6695a11938ae4b90, 0x73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12) err=JsonRpcClientError(JsonRpcError(JsonRpcError { code: -32603, message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical", data: None }))
+    [2023-02-21T02:58:06Z DEBUG web3_proxy::rpcs::one] blastapi failed health check query! Error {
+            context: "ProviderError from the backend",
+            source: JsonRpcClientError(
+                JsonRpcError(
+                    JsonRpcError {
+                        code: -32603,
+                        message: "hash 73a627f588338804e6dc880154728484f7e0373c29057408c6674d75bdc29d12 is not currently canonical",
+                        data: None,
+                    },
+                ),
+            ),
+        }
 - [x] add a "failover" tier that is only used if balanced_rpcs has "no servers synced"
  - use this tier (and private tier) to check timestamp on latest block. if we are behind that by more than a few seconds, something is wrong
+- [x] cli flag to set prometheus port
+- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block
+- [x] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests
+  - erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers
+  - i think this also means we don't need to worry about changing the id that the user gives us.
+- [x] eth_getLogs is going to unsynced nodes because it only checks start block and not the end block
+- [-] if we subscribe to a server that is syncing, it gives us null block_data_limit. when it catches up, we don't ever send queries to it. we need to recheck block_data_limit
 - [-] proxy mode for benchmarking all backends
 - [-] proxy mode for sending to multiple backends
 - [-] let users choose a % of reverts to log (or maybe x/second). someone like curve logging all reverts will be a BIG database very quickly
@ -352,10 +372,11 @@ These are not yet ordered. There might be duplicates. We might not actually need
  - create the app without applying any config to it
  - have a blocking future watching the config file and calling app.apply_config() on first load and on change
  - work started on this in the "config_reloads" branch. because of how we pass channels around during spawn, this requires a larger refactor.
- [ ] have multiple providers on each backend rpc. one websocket for newHeads. and then http providers for handling requests
-  - erigon only streams the JSON over HTTP. that code isn't enabled for websockets. so this should save memory on the erigon servers
-  - i think this also means we don't need to worry about changing the id that the user gives us.
-  - have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code)
+- change if premium concurrency limit to be against ip+rpckey
+  - then sites like curve.fi don't have to worry about their user count
+  - it does mean we will have a harder time capacity planning from the number of keys
+- [ ] eth_getLogs is going to unsynced nodes when synced nodes are available. always prefer synced nodes
+- [ ] have the healthcheck get the block over http. if it errors, or doesn't match what the websocket says, something is wrong (likely a deadlock in the websocket code)
 - [ ] don't use new_head_provider anywhere except new head subscription
 - [ ] maybe we shouldn't route eth_getLogs to syncing nodes. serving queries slows down sync significantly
  - change the send_best function to only include servers that are at least close to fully synced
@ -396,7 +417,6 @@ These are not yet ordered. There might be duplicates. We might not actually need
    - maybe sum available_requests grouped by archive/non-archive. only limit to non-archive if they have enough?
 - [ ] some places we call it "accounting" others a "stat". be consistent
 - [ ] cli commands to search users by key
- [x] cli flag to set prometheus port
 - [ ] flamegraphs show 25% of the time to be in moka-housekeeper. tune that
 - [ ] config parsing is strict right now. this makes it hard to deploy on git push since configs need to change along with it
 - [ ] when displaying the user's data, they just see an opaque id for their tier. We should join that data
@ -488,9 +508,30 @@ These are not yet ordered. There might be duplicates. We might not actually need
 - [ ] relevant erigon changelogs: add pendingTransactionWithBody subscription method (#5675)
 - [ ] change_user_tier_by_key should not show the rpc key id. that way our ansible playbook won't expose it
 - [ ] make sure all our responses follow the spec: https://www.jsonrpc.org/specification#examples
- [ ] min_sum_soft_limit should be automatic based on the apps average rps plus a buffer.
-  - if min_sum_soft_limit > max_sum_soft_limit, just wait for all? emit a warning
+- [ ] min_sum_soft_limit should be automatic based on the app's average rps plus a buffer.
+  - [ ] add a rate counter to the balanced_rpcs
+  - [ ] every time a block is found, update min_sum_soft_limit
+  - [ ] add a min_sum_soft_limit_safety
+      - keeps the automaticly calculated limit from going so high that we stop serving requests
+  - [ ] add a min_sum_soft_limit_max_wait that advances the consensus block even if mins not met yet
 - [ ] a script for load testing a server and calculating its hard and soft limits
+- [ ] use https://github.com/dherman/esprit or similar to parse https://github.com/DefiLlama/chainlist/blob/main/constants/extraRpcs.js
+- [ ] update example.toml
+    - might need to make changes so the influxdb stuff is optional. david said it stopped right after starting
+- [ ] i'm seeing a bunch of errors with eth_getLogs.
+    - i think maybe my block number rewriting is causing problems. but maybe its just a user doing bad queries
+- [ ] Use "is_fresh" instead of our atomic bool
+    - moka 0.10 - Add entry and entry_by_ref APIs to sync and future caches (#193):
+        They allow users to perform more complex operations on a cache entry. At this point, the following operations (methods) are provided:
+            or_default
+            or_insert
+            or_insert_with
+            or_insert_with_if
+            or_optionally_insert_with
+            or_try_insert_with
+        The above methods return Entry type, which provides is_fresh method to check if the value was freshly computed or already existed in the cache.
+- [ ] lag message always shows on first response
+    - http interval on blastapi lagging by 1!

 ## V2

--- a/rate-counter/src/lib.rs
+++ b/rate-counter/src/lib.rs
@ -1,4 +1,6 @@
 //! A counter of events in a time period.
+//!
+//! TODO: maybe better to do something like this though: https://github.com/facebookarchive/metrics/blob/master/ewma.go
 use std::collections::VecDeque;
 use tokio::time::{Duration, Instant};