Git Product home page Git Product logo

Comments (18)

hieblmi avatar hieblmi commented on May 26, 2024 1

Did you recently experience any downtime or outages of your node?
You should do the following:

  1. Stop lnd
  2. Copy the channel.db file into a separate folder
  3. In that separate folder, run chantools compactdb and check the result (see https://github.com/lightninglabs/chantools and https://github.com/lightninglabs/chantools/blob/master/doc/chantools_compactdb.md)

from lnd.

bensig avatar bensig commented on May 26, 2024 1

Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?

How do I do that exactly?

When I run lnd I get:

2024-03-13 11:47:14.649 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-13 11:47:14.649 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
panic: freepages: failed to get all reachable pages (page 256270: multiple references (stack: [256270 255578 256270]))

goroutine 273 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
go.etcd.io/[email protected]/db.go:1178 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 1
go.etcd.io/[email protected]/db.go:1176 +0x1e5

from lnd.

guggero avatar guggero commented on May 26, 2024 1

Yeah, unfortunately if you can't compact the DB anymore it's very likely in a borked state and won't start up again. Restoring from seed and SCB is the safest option, although that will close all channels, which sucks.
But I'm not aware of any way to fix a borked database, unfortunately. Did you have any power outage or did you have to kill (unclean shutdown) of lnd at some point that could explain the problem?

If you restore the node from the seed, you might want to start with a Sqlite backend which is more resilient against those sorts of issues.

I'm closing the issue, since there's not really anything more there is to do here.

from lnd.

hieblmi avatar hieblmi commented on May 26, 2024

Thanks for flagging this issue. Indeed this takes a long time. Do you know how big the db was when you started the compaction? Have you ever compacted the db before?

from lnd.

bensig avatar bensig commented on May 26, 2024

db I believe is compacted automatically whenever the process is restarted.

Last restart appears to have happened on Jan 31:

2024-01-31T00:43:15Z

Unsure when that last occurred before this.

Here is my lnd/data/graph/mainnet dir (user redacted):

-rw-r--r-- 1 user user 1023M Mar  5 10:09 /home/user/.lnd/data/graph/mainnet/channel.db
-rw------- 1 user user     8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/channel.db.last-compacted
-rw-r--r-- 1 user user  132M Mar  5 10:08 /home/user/.lnd/data/graph/mainnet/sphinxreplay.db
-rw------- 1 user user     8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/sphinxreplay.db.last-compacted
-rw-r--r-- 1 user user   65M Mar  8 04:11 /home/user/.lnd/data/graph/mainnet/temp-dont-use.db
-rw-r--r-- 1 user user   33M Mar  5 10:08 /home/user/.lnd/data/graph/mainnet/wtclient.db
-rw------- 1 user user     8 Jan 30 18:51 /home/user/.lnd/data/graph/mainnet/wtclient.db.last-compacted

from lnd.

bensig avatar bensig commented on May 26, 2024

I have now tried to restart without compacting bbolt database to see if that helps - but it's seems to be stuck at "opening" for over an hour already. I'm really not sure about the diff between bbolt db and channel db and what could be causing this...

Changed lnd.conf to:

db.bolt.auto-compact=false

logs:

2024-03-08 03:48:17.282 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 03:48:17.282 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
2024-03-08 04:48:26.182 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 04:48:26.182 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false

from lnd.

C-Otto avatar C-Otto commented on May 26, 2024

It might help (a lot) to run cat channel.db > /dev/null while lnd is starting/compacting. It seems your system only managed to write 65 MByte after 18 hours of compacting, you might want to check your disk (or storage backend) health.

from lnd.

C-Otto avatar C-Otto commented on May 26, 2024

The message "Opening the main database" should only appear once during startup. In your logs it is shown twice, with almost exactly one hour in between. Do you maybe re-start lnd automatically (if it doesn't start up completely within a timeout)?

from lnd.

bensig avatar bensig commented on May 26, 2024

It might help (a lot) to run cat channel.db > /dev/null while lnd is starting/compacting. It seems your system only managed to write 65 MByte after 18 hours of compacting, you might want to check your disk (or storage backend) health.

Disk health is good. This is running on a server cluster that is quite healthy.

running cat /home/user/.lnd/data/graph/mainnet/channel.db >> /dev/null does not seem to help.

Log keeps repeating - I assume this is lnd crashing and the service auto-restarting:

2024-03-08 06:57:12.929 [INF] LTND: Version: 0.17.3-beta commit=v0.17.3-beta, build=production, logging=default, debuglevel=info
2024-03-08 06:57:12.929 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2024-03-08 06:57:12.930 [INF] RPCS: RPC server listening on [::]:10009
2024-03-08 06:57:12.930 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2024-03-08 06:57:12.935 [INF] RPCS: gRPC proxy started at [::]:10080
2024-03-08 06:57:12.935 [INF] RPCS: gRPC proxy started at 0.0.0.0:10080
2024-03-08 06:57:12.935 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 06:57:12.935 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false
2024-03-08 06:57:24.181 [INF] LTND: Version: 0.17.3-beta commit=v0.17.3-beta, build=production, logging=default, debuglevel=info
2024-03-08 06:57:24.181 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2024-03-08 06:57:24.182 [INF] RPCS: RPC server listening on [::]:10009
2024-03-08 06:57:24.182 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2024-03-08 06:57:24.185 [INF] RPCS: gRPC proxy started at [::]:10080
2024-03-08 06:57:24.186 [INF] RPCS: gRPC proxy started at 0.0.0.0:10080
2024-03-08 06:57:24.186 [INF] LTND: Opening the main database, this might take a few minutes...
2024-03-08 06:57:24.186 [INF] LTND: Opening bbolt database, sync_freelist=false, auto_compact=false

from lnd.

C-Otto avatar C-Otto commented on May 26, 2024

How do you start lnd? Maybe systemd is doing weird things?

from lnd.

guggero avatar guggero commented on May 26, 2024

Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?

from lnd.

bensig avatar bensig commented on May 26, 2024

How do you start lnd? Maybe systemd is doing weird things?

Yeah I am using systemd services and docker...

from lnd.

bensig avatar bensig commented on May 26, 2024

Can you check the actual standard out of the daemon (and not just the log)? Do you see a panic/stack trace?

How do I do that exactly?

from lnd.

guggero avatar guggero commented on May 26, 2024

Ugh, that looks like data corruption in the freepages. Did you attempt the steps in this comment? #8532 (comment)

from lnd.

bensig avatar bensig commented on May 26, 2024

Ugh, that looks like data corruption in the freepages. Did you attempt the steps in this comment? #8532 (comment)

Yes, actually I did try it. I ran this a few times:

chantools compactdb --sourcedb channel.db --destdb ./results/compacted.db

...and it seems to stall with no errors in the log and the compacted version seems to be stuck at a size of 57M.

I'll run it in a screen for a few hours to see if it is still stuck at 57M.

from lnd.

bensig avatar bensig commented on May 26, 2024

Does not seem to pass 57M @guggero @hieblmi

Any other ideas?

Or do I just clear data and restore seed and channel backup...

from lnd.

bensig avatar bensig commented on May 26, 2024

What are the steps to restore from seed and scb?

from lnd.

hieblmi avatar hieblmi commented on May 26, 2024

You can find the steps here https://docs.lightning.engineering/lightning-network-tools/lnd/disaster-recovery.

from lnd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.