Comments (18)
Did you recently experience any downtime or outages of your node?
You should do the following:
- Stop
lnd
- Copy the
channel.db
file into a separate folder - In that separate folder, run
chantools compactdb
and check the result (see https://github.com/lightninglabs/chantools and https://github.com/lightninglabs/chantools/blob/master/doc/chantools_compactdb.md)
from lnd.
Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?
How do I do that exactly?
When I run lnd I get:
2024-03-13 11:47:14.649 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-13 11:47:14.649 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
panic: freepages: failed to get all reachable pages (page 256270: multiple references (stack: [256270 255578 256270]))
goroutine 273 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1178 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 1
go.etcd.io/[email protected]/db.go:1176 +0x1e5
from lnd.
Yeah, unfortunately if you can't compact the DB anymore it's very likely in a borked state and won't start up again. Restoring from seed and SCB is the safest option, although that will close all channels, which sucks.
But I'm not aware of any way to fix a borked database, unfortunately. Did you have any power outage or did you have to kill (unclean shutdown) of lnd
at some point that could explain the problem?
If you restore the node from the seed, you might want to start with a Sqlite backend which is more resilient against those sorts of issues.
I'm closing the issue, since there's not really anything more there is to do here.
from lnd.
Thanks for flagging this issue. Indeed this takes a long time. Do you know how big the db was when you started the compaction? Have you ever compacted the db before?
from lnd.
db I believe is compacted automatically whenever the process is restarted.
Last restart appears to have happened on Jan 31:
2024-01-31T00:43:15Z
Unsure when that last occurred before this.
Here is my lnd/data/graph/mainnet
dir (user redacted):
-rw-r--r-- 1 user user 1023M Mar 5 10:09 /home/user/.lnd/data/graph/mainnet/channel.db
-rw------- 1 user user 8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/channel.db.last-compacted
-rw-r--r-- 1 user user 132M Mar 5 10:08 /home/user/.lnd/data/graph/mainnet/sphinxreplay.db
-rw------- 1 user user 8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/sphinxreplay.db.last-compacted
-rw-r--r-- 1 user user 65M Mar 8 04:11 /home/user/.lnd/data/graph/mainnet/temp-dont-use.db
-rw-r--r-- 1 user user 33M Mar 5 10:08 /home/user/.lnd/data/graph/mainnet/wtclient.db
-rw------- 1 user user 8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/wtclient.db.last-compacted
from lnd.
I have now tried to restart without compacting bbolt database to see if that helps - but it's seems to be stuck at "opening" for over an hour already. I'm really not sure about the diff between bbolt db and channel db and what could be causing this...
Changed lnd.conf to:
db.bolt.auto-compact=false
logs:
2024-03-08 03:48:17.282 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 03:48:17.282 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
2024-03-08 04:48:26.182 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 04:48:26.182 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
from lnd.
It might help (a lot) to run cat channel.db > /dev/null
while lnd is starting/compacting. It seems your system only managed to write 65 MByte after 18 hours of compacting, you might want to check your disk (or storage backend) health.
from lnd.
The message "Opening the main database" should only appear once during startup. In your logs it is shown twice, with almost exactly one hour in between. Do you maybe re-start lnd automatically (if it doesn't start up completely within a timeout)?
from lnd.
It might help (a lot) to run
cat channel.db > /dev/null
while lnd is starting/compacting. It seems your system only managed to write 65 MByte after 18 hours of compacting, you might want to check your disk (or storage backend) health.
Disk health is good. This is running on a server cluster that is quite healthy.
running cat /home/user/.lnd/data/graph/mainnet/channel.db >> /dev/null
does not seem to help.
Log keeps repeating - I assume this is lnd crashing and the service auto-restarting:
2024-03-08 06:57:12.929 [INF] LTND: Version: 0.17.3-beta commit=v0.17.3-beta, build=production, logging=default, debuglevel=info
2024-03-08 06:57:12.929 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2024-03-08 06:57:12.930 [INF] RPCS: RPC server listening on [::]:10009
2024-03-08 06:57:12.930 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2024-03-08 06:57:12.935 [INF] RPCS: gRPC proxy started at [::]:10080
2024-03-08 06:57:12.935 [INF] RPCS: gRPC proxy started at 0.0.0.0:10080
2024-03-08 06:57:12.935 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 06:57:12.935 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
2024-03-08 06:57:24.181 [INF] LTND: Version: 0.17.3-beta commit=v0.17.3-beta, build=production, logging=default, debuglevel=info
2024-03-08 06:57:24.181 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2024-03-08 06:57:24.182 [INF] RPCS: RPC server listening on [::]:10009
2024-03-08 06:57:24.182 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2024-03-08 06:57:24.185 [INF] RPCS: gRPC proxy started at [::]:10080
2024-03-08 06:57:24.186 [INF] RPCS: gRPC proxy started at 0.0.0.0:10080
2024-03-08 06:57:24.186 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 06:57:24.186 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
from lnd.
How do you start lnd? Maybe systemd is doing weird things?
from lnd.
Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?
from lnd.
How do you start lnd? Maybe systemd is doing weird things?
Yeah I am using systemd services and docker...
from lnd.
Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?
How do I do that exactly?
from lnd.
Ugh, that looks like data corruption in the freepages. Did you attempt the steps in this comment? #8532 (comment)
from lnd.
Ugh, that looks like data corruption in the freepages. Did you attempt the steps in this comment? #8532 (comment)
Yes, actually I did try it. I ran this a few times:
chantools compactdb --sourcedb channel.db --destdb ./results/compacted.db
...and it seems to stall with no errors in the log and the compacted version seems to be stuck at a size of 57M.
I'll run it in a screen for a few hours to see if it is still stuck at 57M.
from lnd.
Does not seem to pass 57M @guggero @hieblmi
Any other ideas?
Or do I just clear data and restore seed and channel backup...
from lnd.
What are the steps to restore from seed and scb?
from lnd.
You can find the steps here https://docs.lightning.engineering/lightning-network-tools/lnd/disaster-recovery.
from lnd.
Related Issues (20)
- [bug]: lncli openchannel breaking change - fee preference HOT 2
- NewAddress returns duplicate addresses HOT 4
- [feature]: unify address creation behavior between `BtcWalletKeyRing.DeriveNextKey` and `BtcWalletKeyRing.DeriveKey`
- [bug]: I'm completely unable to be synced with the Lightning network. HOT 2
- [bug]: Cannot send coins with 1 sat/vB HOT 6
- [bug]: Creating a new channel
- [bug]: Regression: Missing newline in "Shutdown complete" log message HOT 11
- [bug]: sweep: data race on TxPublisher.currentHeight HOT 2
- [bug]: `ChannelRouter` cannot be shutdown while the `syncGraphWithChain` function is running. HOT 4
- [bug]: Inbound fees are not in the gossip message when using `subscribeChannelGraph` HOT 2
- [bug]: Ping time is not accurate HOT 1
- [feature]: lncli command for walletrpc.EstimateFee HOT 4
- [bug]: Unable to pay AMP invoice with `amp` flag set to true
- Add new RPCs to control the fee bumping process
- [bug]: sweep: BumpFee can cause transaction conflicts HOT 1
- [bug]: contractcourt: max budget used immediately for anchor sweeps HOT 3
- [bug]: sweep: LinearFeeFunction off by one HOT 6
- [feature]: sweep: eliminate change address reuse HOT 2
- [bug]: sweep: AddWalletInputs modifies BudgetInputSet when error occurs
- [feature]: sweep: implement alternative fee functions HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lnd.