Hello. I would like to discuss page-level incremental backups. I’ve created pr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Incremental backups about barman HOT 23 CLOSED

enterprisedb commented on May 20, 2024 3

Incremental backups

from barman.

Comments (23)

kamikaze commented on May 20, 2024 46

This project is going to die with such speed and priorities, don't waste your time guys. Fork it

from barman.

secwall commented on May 20, 2024 22

@kamikaze @soshnikov, could you kindly stop blaming? @gbartolini explained why incremental backups are not merged yet. We want this feature in mainline because we lack resources to support our own fork.

from barman.

RealLord commented on May 20, 2024 19

Hm.... It's extremely strange, that the real feature that can provide more backup performance than all other Postgress backup software is still in working progress, not in production.

from barman.

gbartolini commented on May 20, 2024 17

In this period we have been extremely busy releasing version 2.0, with all the new features you are aware of. In order to include this patch in Barman we drafted a plan with secwall that included several code contributions, some of which have already been implemented, such as:

8cbe022
d4794e8
1dd8ea8 (even though it seems unrelated)
0a646b6 (ditto)
fec853b (ditto)
1c9ca22
22b852b (important)
32aafe3 (important)

The difficulty in this patch is, as we have said in the past, to integrate it with any existing use case of Barman, without breaking back compatibility. Also, our approach is to reach the goal through an incremental process.

The next step will be to add parallel backup (v3?) - which should be quite straightforward now with the CopyController infrastructure, and then integrate the work of secwall on a remote PostgreSQL server, with an agent (for this reason too we have created the barman-cli package).

I hope that with this message you can clearly see our commitment and our efforts to get to this goal. Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

from barman.

man-brain commented on May 20, 2024 3

@FractalizeR, yes, this is just misunderstanding. The reason is that article was written (in written form) from my verbal words on the conference and after that edited by copywriter. My point was that [here](#21 (comment)) @gbartolini wrote:

Of course, having a stakeholder willing to fund the development of such a feature will raise the priority of this feature and allow us to develop that in a shorter timeframe.

As you can see, these two sentences are completely different.

from barman.

secwall commented on May 20, 2024 2

Hmm. It seems that there is no discussion. Let's move to exact questions:

Is issue with many datafiles change with not so many pages change quite common (e.g. do we need page-level incremental backups in barman)?
Running script over ssh on postgresql database host could be not so good idea. May be there are other ways of making page-level incremental backups possible?
If current approach is ok, what should be fixed in my fork before merging? (code style in barman-incr, unit-tests and docs, anything else?)

from barman.

FractalizeR commented on May 20, 2024 2

Yandex guys said here, that Barman authors ask for money to merge this feature.

Уже почти год прошел, как мы их просим запилить эту киллер-фичу, а они просят с нас денег, чтобы замержить её.

Nearly a year has passed since we are asking [Barman team] to merge this killer-feature. And they are asking us for money to merge it.

Translation into English is mine.

Can someone elaborate what the problem is? Where did this money question came from? I think this is just misunderstanding, right?

from barman.

AntonBushmelev commented on May 20, 2024 2

Hello guys, any news about implementing this killer feature ?

from barman.

man-brain commented on May 20, 2024 2

I suppose that this issue should be closed since nothing has been done for 2,5 years. We have merged all these features to wal-g upstream and we wouldn't support our barman fork any more.

from barman.

man-brain commented on May 20, 2024

Any thoughts, guys?

from barman.

gbartolini commented on May 20, 2024

Hi,

first thanks for your contribution. We are currently 100% focused on
Barman 1.6.0 with streaming replication support. Hence we apologise for not
responding any earlier.

As far as this is concerned, our ultimate goal is to have this feature in
PostgreSQL's core (pg_basebackup), rather than having it part of Barman -
you can see our previous attempts at this in the hackers list of PostgreSQL.

However, having said this, we were discussing over lunch about your patch
just yesterday and one idea that came up could be to add a function in
pgespresso that returns the content a requested block in a file (or a list
of blocks). This would avoid installing an agent on the Postgres server.

Please bear with us, we will do our best to evaluate your code but it
won't be any time soon.

Thanks,
Gabriele

Gabriele Bartolini - 2ndQuadrant Italia - Managing Director
PostgreSQL Training, Services and Support
[email protected] | www.2ndQuadrant.it

2016-01-13 9:58 GMT+01:00 secwall [email protected]:

Hmm. It seems that there is no discussion. Let's move to exact questions:

Is issue with many datafiles change with not so many pages change quite
common (e.g. do we need page-level incremental backups in barman)?

Running script over ssh on postgresql database host could be not so
good idea. May be there are other ways of making page-level incremental
backups possible?

If current approach is ok, what should be fixed in my fork before
merging? (code style in barman-incr, unit-tests and docs, anything else?)

—
Reply to this email directly or view it on GitHub
#21 (comment)
.

from barman.

man-brain commented on May 20, 2024

We are currently 100% focused on Barman 1.6.0 with streaming replication support. Hence we apologise for not responding any earlier.

No problem, guys. Although we are doing lots of rebasing :) you are doing right work, thanks!

As far as this is concerned, our ultimate goal is to have this feature in PostgreSQL's core (pg_basebackup), rather than having it part of Barman - you can see our previous attempts at this in the hackers list of PostgreSQL.

Yep, we've seen that but it seems that you have given up on it after you didn't have time to push it into 9.5. Having it in core PostgreSQL would be really great but our change brings not only increments, it also brings parallelism and compression. These two changes are really important for quite big databases. Rsync or pg_basebackup support compression but right now you can hit either network bandwidth (no compression) or speed of one CPU core (with compression). We are launching several processes to have an ability to utilize all resources and do it with maximum efficiency and flexibility.

... one idea that came up could be to add a function in pgespresso that returns the content a requested block in a file (or a list of blocks). This would avoid installing an agent on the Postgres server.

Yes, we do really want to avoid the need of installing something else on database servers, but implementing such a thing in pgespresso (or other extension with use of libpq) may be not a good decision. It would be quite difficult (but possible) to save parallelism and it would make restore much complicated. Actually, most of restore logic (decompression and merging increments) would be done on backup server neither on database host which seems a bit odd.

from barman.

secwall commented on May 20, 2024

Hello, guys.
I see 1.6.0 release. So could we continue our discussion?
As @Dev1ant mentioned moving logic into pgespresso will make recover more complex.
Also db hosts have more CPU power and faster disks in our environment, so it's better to perform heavy operations on them (recover operation with barman-incr on db host in our tests is about 3 times faster than on barman host). And it seems to be quite common case?

from barman.

man-brain commented on May 20, 2024

Any chance you will take a look at it, guys?

from barman.

secwall commented on May 20, 2024

We started using fork with incremental backups on production.
Here are some numbers:
Our typical database looks like this (so pgdata is about 5 TiB):

root@xdb2011g ~ # df -h | grep pgsql
/dev/md4         14T  5.0T  8.1T  39% /var/lib/pgsql/9.4/data
/dev/md3        189G   82G   98G  46% /var/lib/pgsql/9.4/data/pg_xlog

It's backup looks like this (we use gzip -3 for backup compression and gzip -6 for WAL compression):

root@pg-backup05i ~ # barman list-backup xdb2011
xdb2011 20160330T020103 - Wed Mar 30 03:53:47 2016 - Size: 51.0 GiB - WAL Size: 60.8 GiB
xdb2011 20160329T020103 - Tue Mar 29 03:51:44 2016 - Size: 50.3 GiB - WAL Size: 114.8 GiB
xdb2011 20160328T020103 - Mon Mar 28 03:45:12 2016 - Size: 52.3 GiB - WAL Size: 112.8 GiB
xdb2011 20160327T020103 - Sun Mar 27 09:50:25 2016 - Size: 1.0 TiB - WAL Size: 88.7 GiB
xdb2011 20160326T020102 - Sat Mar 26 04:52:37 2016 - Size: 58.4 GiB - WAL Size: 122.1 GiB
xdb2011 20160325T020102 - Fri Mar 25 03:42:46 2016 - Size: 58.9 GiB - WAL Size: 122.6 GiB
xdb2011 20160324T020103 - Thu Mar 24 03:38:19 2016 - Size: 39.0 GiB - WAL Size: 126.5 GiB
xdb2011 20160323T020103 - Wed Mar 23 04:39:37 2016 - Size: 33.5 GiB - WAL Size: 82.2 GiB
xdb2011 20160322T020103 - Tue Mar 22 04:51:06 2016 - Size: 33.0 GiB - WAL Size: 76.1 GiB - OBSOLETE*
xdb2011 20160321T020103 - Mon Mar 21 04:20:11 2016 - Size: 28.2 GiB - WAL Size: 74.2 GiB - OBSOLETE*
xdb2011 20160320T020106 - Sun Mar 20 09:22:48 2016 - Size: 971.3 GiB - WAL Size: 48.4 GiB - OBSOLETE*

We start backups at 02:00, so full backup takes about 7-8 hours, and incremental backup takes about 3 hours (we could get speed up here by using block change tracking, but it is not ready yet). Backups + WALs for recovery window of 1 week consumes about 3.3 TB for this database.

from barman.

gbartolini commented on May 20, 2024

Hi guys,

I have to apologise again, but as you might have noticed, adding streaming replication support has taken longer than just 1.6.0! We have just released 1.6.1 and are working on 1.6.2/1.7.0 which will hopefully bring full pg_basebackup support and streaming-only backup solutions (suitable for PostgreSQL on Docker and Windows environments too).

Your patch is definitely very interesting but until we have completed support for streaming only backups, we have to postpone the review and the integration (mainly for testing purposes).

However, I thank you again for your interest and your efforts.

Ciao,
Gabriele

from barman.

gbartolini commented on May 20, 2024

While looking at your patch, I have been thinking about two possible ideas:

Do you think you can isolate the lzma patch so that we can include that separately in Barman's core?
I'd suggest having the remote 'barman-incr' as a separate script - it could even be a more generic barman-agent script that will be executed on the Postgres server via Ssh

Thanks again,
Gabriele

from barman.

secwall commented on May 20, 2024

Hello.

May be we could just import lzma only if lzma compression was requested by user (and return error if module is unavailable), is this approach ok? (lzma is currently used only in barman-incr, I didn't change WAL compression part).
It seems that I don't understand this part. barman-incr is actually in separate package (https://github.com/secwall/barman/blob/master/rpm/barman.spec#L49-55) and it is really executed on postgresql server via ssh (https://github.com/secwall/barman/blob/master/barman/backup_executor.py#L774-783)

from barman.

man-brain commented on May 20, 2024

Any success here, guys? Very soon it would be a one year open PR...

from barman.

FractalizeR commented on May 20, 2024

Yep, sure, I can see that now. Sorry for late reply.

from barman.

s200999900 commented on May 20, 2024

Hi!

Sorry for my easy english )

It is very helpful function!

I suggest looking at "borg backup" project as a storage backend.
https://github.com/borgbackup/borg
It has many good backup functionality: encryption, compression, deduplication, ssh as transport...

There are no python api for now, but it is possible to run as wrapper script for create, restore, check, list backup operation's.

I can help with testing in that way. But I need some help with right instruction to do that.

from barman.

kamikaze commented on May 20, 2024

I suppose this project should be closed since nothing has been done for 2.5 years

from barman.

amenonsen commented on May 20, 2024

It's a pity this feature was not merged, especially because the patch (even today) looks really nicely done.

That said, with the benefit of several years of hindsight, scanning page header to detect updates based on LSN is a lot faster than rsync, but still too expensive for very large data directories. We know there are extensions like ptrack which take a more proactive approach to recording changes, and that seems like the right approach going forward.

Meanwhile, this project is now under active maintenance again. But I'll close this issue now because there's no point leaving it open. I do hope to support incremental backups, but we (still!) hope that core Postgres will eventually provide a feature that Barman can use to do so.

from barman.

Incremental backups about barman HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent