lightningdevkit / rapid-gossip-sync-server Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
SNAPSHOT_CALCULATION_INTERVAL
is set to a constant 24-hours. Allow making this configurable, which would be useful for Network::Signet
instances.
https://rapidsync.lightningdevkit.org/snapshot/
endpoint will return 404 in all of these cases:
https://rapidsync.lightningdevkit.org/snapshot/1663632000
- recent timestamp (no data?)https://rapidsync.lightningdevkit.org/snapshot/1663657489
- random timestamphttps://rapidsync.lightningdevkit.org/snapshot/-1
- incorrect parameterhttps://rapidsync.lightningdevkit.org/snapshot/fehufghweku/32
- incorrect URLIt would be better to indicate absence of data with either empty payload or payload that has 0s for nodeId/announcement/update counts.
Currently, the announcement inclusion logic lives here:
https://github.com/lightningdevkit/rapid-gossip-sync-server/blob/main/src/serialization.rs#L134
We need to update it such that if a channel at any point had an interval exceeding two weeks between updates in either direction (since the client's last seen timestamp), an announcement is also included.
It seems that tokio-postgres 0.7.8
requires postgres-protocol 0.6.5
, which requires base64 0.21.0
, which breaks on Rust 1.48. We should figure out a way to either fix it upstream or see if an older version will fix it.
Some RGS clients need address data for sockets. We should keep track of and serialize changes to node addresses in RGS snapshots.
Components:
According to the code, the server should wait on running the Snapshotter until we're caught up with gossip but instead it runs the Snapshotter just moments after bootup after receiving a small number of gossip msgs, which creates incomplete/invalid snapshots.
Log:
Feb 09 13:19:27 rapid-gossip-sync-server[7199]: Starting gossip download
trimmed...
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: gossip count (iteration 3): 18 (delta: 13):
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: announcements: 7
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: mismatched scripts: 0
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: updates: 11
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: no HTLC max: 0
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: caught up with gossip!
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: Initial sync complete!
Feb 09 13:19:42 rapid-gossip-sync-server[7199]: Initiating snapshotting service
I use a load balancer health check to make sure the snapshot exists before forwarding user requests to it but this behavior makes it impossible to know if an RGS server is fully caught up and has ready-to-use snapshots. Is this an intended behavior?
In downloader.rs, we trigger the storage of gossip messages from get_and_clear_pending_msg_events
, but we also trigger the same storage from RoutingMessageHandler
.
We should ask our past selves why we did that, and either document that requirement better, or get rid of the dupes.
Noticed this when trying to make a 1M sat payment today:
node_id: 031f2669adab71548fad4432277a0d90233e3bc07ac29cfb0b3e01bd3fb26cb9fa, short_channel_id: 873706024403271681, fee_msat: 101060
But it was failing at that channel due to incorrect_fee reasons.
On my RGS postgresql database, this is what I was seeing, expecting 130 PPM for the fee:
SELECT id, short_channel_id, timestamp, channel_flags, direction, disable, cltv_expiry_delta, htlc_minimum_msat, fee_base_msat , fee_proportional_millionths , htlc_maximum_msat, seen FROM channel_updates WHERE short_channel_id = 873706024403271681 ORDER BY direction, seen;
id | short_channel_id | timestamp | channel_flags | direction | disable | cltv_expiry_delta | htlc_minimum_msat | fee_base_msat | fee_proportional_millionths | htlc_maximum_msat | seen
---------+--------------------+------------+---------------+-----------+---------+-------------------+-------------------+---------------+-----------------------------+-------------------+----------------------------
15502 | 873706024403271681 | 1693507369 | 0 | f | f | 100 | 1000 | 1000 | 38 | 10000000000 | 2023-09-01 05:35:11.748784
247841 | 873706024403271681 | 1693593990 | 0 | f | f | 100 | 1000 | 1000 | 38 | 10000000000 | 2023-09-01 18:52:43.61318
447515 | 873706024403271681 | 1693680390 | 0 | f | f | 100 | 1000 | 1000 | 38 | 10000000000 | 2023-09-02 18:49:44.298348
585121 | 873706024403271681 | 1693749109 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-03 13:54:29.804143
729121 | 873706024403271681 | 1693836991 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-04 14:19:38.937046
931701 | 873706024403271681 | 1693925190 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-05 14:51:16.707509
1112401 | 873706024403271681 | 1694008323 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 13:56:51.222476
1136977 | 873706024403271681 | 1694018628 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 16:47:14.999251
1138764 | 873706024403271681 | 1694019342 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 17:00:21.656834
1148384 | 873706024403271681 | 1694023548 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 18:09:01.657005
1149419 | 873706024403271681 | 1694023850 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 18:16:00.804636
1188505 | 873706024403271681 | 1694042148 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 23:19:21.038322
1189091 | 873706024403271681 | 1694042448 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-06 23:23:40.936604
1293025 | 873706024403271681 | 1694092428 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 13:17:40.79054
1293393 | 873706024403271681 | 1694092743 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 13:21:34.623002
1297879 | 873706024403271681 | 1694094948 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 13:59:46.285513
1298703 | 873706024403271681 | 1694095329 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 14:06:06.578566
1309753 | 873706024403271681 | 1694100468 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 15:31:06.75473
1310321 | 873706024403271681 | 1694100754 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 15:35:41.105463
1322167 | 873706024403271681 | 1694106168 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 17:07:25.934965
1325240 | 873706024403271681 | 1694107716 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 17:32:15.032081
1336331 | 873706024403271681 | 1694111988 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 18:43:15.95753
1345006 | 873706024403271681 | 1694116170 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 19:54:11.635986
1356864 | 873706024403271681 | 1694121588 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 21:25:52.619199
1365139 | 873706024403271681 | 1694124869 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 22:18:06.76145
1370112 | 873706024403271681 | 1694126868 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 22:51:01.339276
1373550 | 873706024403271681 | 1694128376 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-07 23:15:50.330677
1482994 | 873706024403271681 | 1694182848 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 14:24:02.545283
1483284 | 873706024403271681 | 1694182916 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 14:25:39.503017
1486637 | 873706024403271681 | 1694184408 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 14:50:52.477101
1491688 | 873706024403271681 | 1694186566 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 15:28:22.063673
1504741 | 873706024403271681 | 1694192688 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 17:09:30.906925
1506395 | 873706024403271681 | 1694193560 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 17:23:11.27144
1522609 | 873706024403271681 | 1694201088 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 19:29:05.838715
1522813 | 873706024403271681 | 1694201131 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 19:30:51.187243
1542392 | 873706024403271681 | 1694210688 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 22:10:42.436836
1544634 | 873706024403271681 | 1694211576 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 22:23:52.552234
1556311 | 873706024403271681 | 1694216928 | 2 | f | t | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-08 23:52:01.197772
1562437 | 873706024403271681 | 1694219924 | 0 | f | f | 100 | 1000 | 1000 | 209 | 10000000000 | 2023-09-09 00:41:41.696928
1594604 | 873706024403271681 | 1694267536 | 0 | f | f | 100 | 1000 | 1000 | 210 | 10000000000 | 2023-09-09 16:58:00.938937
1791353 | 873706024403271681 | 1694355390 | 0 | f | f | 100 | 1000 | 1000 | 210 | 10000000000 | 2023-09-10 14:21:20.247764
1971384 | 873706024403271681 | 1694441790 | 0 | f | f | 100 | 1000 | 1000 | 210 | 10000000000 | 2023-09-11 14:17:29.261606
2007489 | 873706024403271681 | 1694457048 | 2 | f | t | 100 | 1000 | 1000 | 210 | 10000000000 | 2023-09-11 18:32:34.654661
2011327 | 873706024403271681 | 1694458808 | 0 | f | f | 100 | 1000 | 1000 | 210 | 10000000000 | 2023-09-11 19:02:35.477949
2160782 | 873706024403271681 | 1694526734 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-12 13:53:31.471492
2370180 | 873706024403271681 | 1694614590 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-13 14:17:30.858651
2505861 | 873706024403271681 | 1694672148 | 2 | f | t | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-14 06:18:34.141987
2594264 | 873706024403271681 | 1694707381 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-14 16:13:16.351301
2806404 | 873706024403271681 | 1694794765 | 0 | f | f | 100 | 1000 | 1000 | 200 | 10000000000 | 2023-09-15 16:23:04.327612
5239346 | 873706024403271681 | 1695909301 | 0 | f | f | 100 | 1000 | 1000 | 130 | 10000000000 | 2023-09-28 20:27:42.790376
15501 | 873706024403271681 | 1693300588 | 1 | t | f | 34 | 1 | 1000 | 10 | 9900000000 | 2023-09-01 05:35:11.747999
241922 | 873706024403271681 | 1693591751 | 3 | t | t | 34 | 1 | 1000 | 10 | 9900000000 | 2023-09-01 18:13:44.225598
242207 | 873706024403271681 | 1693591752 | 1 | t | f | 34 | 1 | 1000 | 10 | 9900000000 | 2023-09-01 18:15:45.392541
5239345 | 873706024403271681 | 1695003621 | 1 | t | f | 34 | 1 | 1000 | 10 | 9900000000 | 2023-09-28 20:27:42.789177
However, client side I was seeing 100 PPM + 1 base:
nettwork graph: features: 0000, node_one: 031f2669adab71548fad4432277a0d90233e3bc07ac29cfb0b3e01bd3fb26cb9fa, one_to_two: Some(ChannelUpdateInfo { last_update: 1695326400, enabled: true, cltv_expiry_delta: 100, htlc_minimum_msat: 1000, htlc_maximum_msat: 10000000000, fees: RoutingFees { base_msat: 1000, proportional_millionths: 38 }, last_update_message: None }), node_two: 03aefa43fbb4009b21a4129d05953974b7dbabbbfb511921410080860fca8ee1f0, two_to_one: Some(ChannelUpdateInfo { last_update: 1695326400, enabled: true, cltv_expiry_delta: 34, htlc_minimum_msat: 1, htlc_maximum_msat: 9900000000, fees: RoutingFees { base_msat: 1000, proportional_millionths: 10 }, last_update_message: None })
We do hourly syncs and even forced a few syncs (and confirmed the message / update counts lined up) but the 38 value persisted. When we pointed to your RGS server today, it was showing 100 instead of 130 for the fee. I'm not sure what historical values you had for the LDK RGS server, but one funny thing is that when we went back in time to the first update we had stored for the channel, it was at 38 PPM.
If you're able to reproduce an RGS client showing 100 for this channel, I'm curious if the 100 PPM is the first entry you have on that server. I was able to solve this by wiping my DB and resyncing gossip again. We only do full sync retrieval in Mutiny so it's not a problem for us but it would be quite annoying to keep doing that every day, and would be unreasonable for you guys to in order to support partial syncs.
Also, maybe unrelated note: This channel either had stopped updating or there was a problem with our RGS in that it stopped syncing updates. We didn't get anything for this channel from 9-15 to 9-28 until we restarted the server today and got that last entry.
A user trying to run the RGS server in regtest network stumbled across the fact that the server currently assumes the snapshots reference_timestamp + SNAPSHOT_CALCULATION_INTERVAL * {1, 2, 3, 4, 5, 6, 7, 14, 21}
to be present.
The SNAPSHOT_CALCULATION_INTERVAL
of 24h is hence rather impractical for testing purposes, as it requires a user wanting to test the full snapshot range the to wait up to three weeks.
We should therefore either make SNAPSHOT_CALCULATION_INTERVAL
user-configurable, or provide a dedicated testing mode, which does the same.
Postgres refuses to use indexes if you have a write only database and have done a bunch of writes. Primarily postgres is stupid but we should document that you should tune the Postgres autovacuum parameters to avoid queries doing on disk sorting.
RGS randomly hangs after sometime.
RGS does not hang if the database is slow.
#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
LN_PEERS=02eadbd9e7557375161df8b646776a547c5cbc2e95b3071ec81553f8ec2cea3b8c@18.191.253.246:9735,03bae2db4b57738c1ec1ffa1c5e5a4423968cc592b3b39cddf7d495e72919d6431@18.202.91.172:9735,038863cf8ab91046230f561cd5b386cbff8309fa02e3f0c3ed161a3aeb64a643b9@203.132.94.196:9735
(more peers โ higher the changes to hangs)
The problem is with mpsc::channel(100)
for GossipMessage
. There is a task which receives elements in GossipPersister
and tasks which send elements in GossipRouter
.
GossipRouter
uses try_send()
method to send without blocking the thread, but if it fails (when the channel is full) it uses blocking send()
, what blocks the thread. But this thread is a tokio executor thread, if all tokio executor threads get blocked in GossipRouter::new_channel_announcement()
or GossipRouter::new_channel_update()
then the task to receive elements from the channel will never be executed. Deadlock.
Note that there is a code in GossipRouter
which tries to minimize the risk of deadlock, but unfortunate it does not eliminate it:
tokio::task::block_in_place(move || { tokio::runtime::Handle::current().block_on(async move {
self.sender.send(gossip_message).await.unwrap();
})});
At the moment, if the following unwrap fails in GossipPersister::persist_gossip()
, the tokio task panics, but the server continues running. I think it would be useful to kill the whole application if the persist_gossip()
task panics.
let connection_config = config::db_connection_config();
let (mut client, connection) =
connection_config.connect(NoTls).await.unwrap();
Seeing a flood of these issues every now and then on mobile clients. They process rather quickly, but there are like 5k - 10k of these that show up when it happens. I haven't identified any reason when it shows up yet or when, or even if it appears to affect anything, but I assume the latest graph isn't updated when it happens. Have not been able to reproduce it yet.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806662203334393856 with flags 138 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806708382877548545 with flags 145 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806734771077971969 with flags 129 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806797443309961216 with flags 128 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806853518323941376 with flags 130 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806862314562715651 with flags 153 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806882105656344577 with flags 128 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806882105656344577 with flags 129 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806889802184785920 with flags 128 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806898598481035265 with flags 129 as original data is missing.
2023-07-06 00:22:21.305 TRACE [lightning_rapid_gossip_sync::processing:191] Skipping application of channel update for chan 806965668536451073 with flags 136 as original data is missing.
User reported:
if I do an update with a timestamp just a few minutes after the previous update I still get a ~323kb blob and the error failed to update network graph with rapid gossip sync data: LightningError: LightningError { err: "Update had same timestamp as last processed update", action: IgnoreDuplicateGossip }
It seems we should be returning a dummy update if we get a request for the last generation timestamp. Also, this shouldn't be an error but that's a rust-lightning issue.
Sorry this is more a question than an actual bug probably.
I am trying to run this using Docker (Kubernetes connected to a Polar based regtest) and my server looks connected to at least 2 of the running LNDs in the network, I am getting the graph and so on.
But server is not a server... no bind ports. At dockerfile I saw 8000 is supposed to be exposed, it definitely might bind to any port if this is the real service behind https://rapidsync.lightningdevkit.org/.
Any idea what could be happening?
So while we'd previously fixed hang issues, there's still one remaining: if the DB backs up and we start blocking in the peer message handler, that's fine, except that its holding the peer read lock in LDK's peer_handler
. Because that's a sync lock, any future calls into LDK's peer_handler
which require the same peer's lock or a total peer write lock will immediately block until the DB catches up. That's all fine, unless there's enough work going on in the peer_handler
to block enough tokio tasks that we no longer make progress on the DB, which can happen if there's lots of connections churning or not very many threads.
Not sure how to solve this, it may ultimately need a may-block-on-async flag in lightning-net-tokio
, which would then make lightning-net-tokio
much less efficient by calling block_in_place
, but would allow this type of application.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.