Git Product home page Git Product logo

barman's People

Contributors

amenonsen avatar barthisrael avatar d0b3rm4n avatar didiermichel avatar dulhaver avatar fcanovai avatar gbartolini avatar gcalacoci avatar github-actions[bot] avatar gonzalemario avatar ibarwick avatar jeffjanes avatar jthreefoot-edb avatar leonardoce avatar martinmarques avatar mgalgs avatar mhkarimi1383 avatar mikewallace1979 avatar mnencia avatar moench-tegeder avatar mswiech avatar patrickbucher avatar richyen avatar rubensts avatar sjuls avatar steamraven avatar stratakis avatar theadamwright avatar thoro avatar zacchiro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

barman's Issues

backups failing with python error

Another failing backup problem. We're seeing this pretty often, too. This is backing up a postgres 9.1.9 instance with barman 1.5.1 running with Python 2.6.6 (system python on Centos6) installed using pip.

2015-11-23 01:00:54,147 [8642] barman.backup_executor INFO: Copy done.
2015-11-23 01:00:54,151 [8642] barman.backup_executor INFO: Asking PostgreSQL server to finalize the backup.
2015-11-23 01:00:54,839 [8642] barman.backup ERROR: Backup failed issuing start backup command.
DETAILS: Cannot terminate exclusive backup. You might have to  manually execute pg_stop_backup() on your PostgreSQL server
2015-11-23 01:00:55,455 [32660] barman.cli ERROR: 'NoneType' object has no attribute 'rfind'
See log file for more details.
Traceback (most recent call last):
  File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/cli.py", line 865, in main
    p.dispatch(pre_call=global_config)
  File "build/bdist.linux-x86_64/egg/argh/helpers.py", line 55, in dispatch
    return dispatch(self, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "build/bdist.linux-x86_64/egg/argh/dispatching.py", line 231, in _call
    result = function(namespace_obj)
  File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/cli.py", line 470, in list_files
    for line in backup_id.get_list_of_files(args.target):
  File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/infofile.py", line 524, in get_list_of_files
    for x in self.get_required_wal_segments():
  File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/xlog.py", line 163, in enumerate_segments
    end_tli, end_log, end_seg = decode_segment_name(end)
  File "/usr/local/barman/lib/python2.6/site-packages/barman-1.5.1-py2.6.egg/barman/xlog.py", line 125, in decode_segment_name
    name = os.path.basename(path)
  File "/usr/lib64/python2.6/posixpath.py", line 111, in basename
    i = p.rfind('/') + 1
AttributeError: 'NoneType' object has no attribute 'rfind'

Strange comparation last_failed_wal <= last_archived_wal

Problem with false positive "continuous archiving: FAILED".

It is simmiliar as this issue: https://sourceforge.net/p/pgbarman/tickets/77/, which was resolved by this commit: dcb22e8.

Environment:

  • barman 1.6.0
  • postgres 9.4.7

Description

There is some info from my pg_stat_archiver:

postgres=# select last_archived_wal, last_archived_time, last_failed_wal, last_failed_time from pg_stat_archiver;
    last_archived_wal     |      last_archived_time       |     last_failed_wal      |       last_failed_time        
--------------------------+-------------------------------+--------------------------+-------------------------------
 000000010000006D0000004D | 2016-05-31 02:04:20.222315+02 | 000000010000006700000094 | 2016-05-22 02:23:37.379004+02

But when I try part of your select (last_failed_wal <= last_archived_wal), it gives me FALSE:

postgres=# select last_failed_wal <= last_archived_wal as result from pg_stat_archiver;
 result 
--------
 f

You can also try select 000000010000006700000094 <= 000000010000006D0000004D;
Same as select '7' < 'D';

I think this is wrong comparation because in fact 000000010000006700000094 is older than 000000010000006D0000004D

Should this comparation last_failed_wal <= last_archived_wal be replaced with last_failed_time <= last_archived_time ?

Corrupted WALs after a blind server copy

We copied our PostgreSQL server to a 2nd machine to do tests and troubleshooting.
Unfortunately, we forgot to turn off the archive_mode before starting PostgreSQL.

The consequence is that both servers rsync the WALs files to the same /incoming/directory and the incremental archive becomes unusable.

Maybe barman can give more prominent warnings in such case.
Alternatively, the documentation could suggest an archive_command which prevents such mistake.

backups failing rsync missing files

Backups are failing alarmingly often for us because of this error:

2015-11-23 21:40:19,148 [21291] barman.backup ERROR: Backup failed copying files.
DETAILS: data transfer failure on directory '/usr/local/pgsql/data'
rsync error:
... list of files here ...
rsync: link_stat "/usr/local/pgsql/data/base/pgsql_tmp/pgsql_tmp14179.115567" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1505) [generator=3.0.6]

We are using Postgres 9.4.5, rsync 3.0.6, and barman 1.5.1. I see in the code that this is supposed to be ignored except that our backup definitely failed according to barman:

postgres@xxx:/usr/lusers/plockaby$ barman list-backup xxx
xxx 20151124T211803 - Tue Nov 24 21:50:25 2015 - Size: 38.5 GiB - WAL Size: 897.7 MiB
xxx 20151123T211803 - FAILED
xxx 20151122T211803 - Sun Nov 22 21:39:20 2015 - Size: 33.8 GiB - WAL Size: 20.0 GiB
xxx 20151121T211804 - Sat Nov 21 21:38:43 2015 - Size: 33.4 GiB - WAL Size: 6.4 GiB
xxx 20151120T211803 - FAILED
xxx 20151119T211803 - FAILED
xxx 20151119T171220 - Thu Nov 19 17:44:21 2015 - Size: 32.7 GiB - WAL Size: 15.6 GiB
xxx 20151119T112806 - Thu Nov 19 11:44:58 2015 - Size: 32.5 GiB - WAL Size: 2.8 GiB

Timeline unawareness

In case of timeline switch between two base backups, Barman is unable to follow the new timeline while associating WAL files to a backup.

A notable symptoms is that during "barman show-backup" the "Last available" WAL is not updated.

This issue might be related to #29

replication-status does not work with PostgreSQL 9.1

When executing replication-status I get a misleading exception message (unable to connect).

The reason is that replication-status code uses pg_xlog_location_diff which has been introduced in 9.2.

2016-07-12 16:06:02,790 [7734] barman.postgres DEBUG: Error retrieving status of standby servers: function pg_xlog_location_diff(text, text) does not exist
LINE 1: ...te , CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_xlog_lo...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
2016-07-12 16:06:02,812 [7734] barman.server ERROR: Unable to connect to server XXXXX

PostgreSQL failed to start after barman recover

We have a PostgreSQL replication cluster consisting of a primary server and several standby servers, and we use Barman 1.6 on a separate server to manage our backups. The primary PG server is configured to archive WAL files to the barman server. We also run base backups of the primary server from the barman server.

Recently, we upgraded the PG replication cluster from 9.4 to 9.5. After the upgrade to 9.5, WAL file archiving resumed without any issues, and base backups completed successfully. We validate our backups by performing "barman recover" operations from the barman server to a remote server. Although the barman recover command ran successfully, PostgreSQL failed to start on the remote host. Error from the PG log: FATAL,XX000,"could not locate required checkpoint record"

I checked the output from the barman recover command, and couldn't find the Begin WAL or End WAL files anywhere on the barman server. Investigating further, I checked the barman wals folder on the barman server, and discovered that 9.5 WAL files reported as being archived in our barman.log file no longer exist in the wals folder. However, all of the WAL files that were archived for 9.4 exist in the wals folder. Apparently, barman deleted the 9.5 WAL files after we take our nightly base backup of the primary server.

barman check z-prod
Server z-prod:
PostgreSQL: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 18 backups, expected at least 0)
ssh: OK (PostgreSQL server)
not in recovery: OK
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK

Backup listed as failed due to rsync - false positive?

We had a backup reported as FAILED, with the following error in the logs:

rsync: read errors mapping "/our/pgdata/dir/base/16389/4452859": No data available (61)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1518) [generator=3.0.9]

As far as I can see, rsync can throw this error on files that are truncated during transfer. Our setup has a lot of volatile (bursts of) data, where it is likely that (auto)vacuum can truncate a file once in a while.

During the backup, the number of dead tuples decreased from 130K to about 30K, which makes it more likely it was actually a truncate due to a vacuum.
I don't have logs explicitly stating that it was a vacuum however.
vacuum

barman replication-status exception

When run barman replication-status server-id
error:
EXCEPTION: 'Record' object has no attribute 'slot_name'
See log file for more details.
The log:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 1022, in main
p.dispatch(pre_call=global_config)
File "/usr/lib/python2.7/dist-packages/argh/helpers.py", line 53, in dispatch
return dispatch(self, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 125, in dispatch
for line in lines:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 202, in _execute_command
for line in result:
File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 158, in _call
result = args.function(args)
File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 287, in replication_status
server.replication_status(args.target)
File "/usr/lib/python2.7/dist-packages/barman/server.py", line 1791, in replication_status
standby_info)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 254, in result
_dispatch(_writer, 'result', command, _args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 127, in _dispatch
return handler(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/barman/output.py", line 839, in result_replication_status
if standby.slot_name:
AttributeError: 'Record' object has no attribute 'slot_name'

How to verify backups?

I'm still learning barman and postgres backups and I'm lost on how to validate that everything is running fine. I've everything setup with docker containers and cron jobs that run barman cron and barman backup every 1 and 15 minutes respectively.

Since the clusters are large (+- 50GB each) the backup part takes a long time and uses a lot of resources. So, after reading more docs I realized I don't need full backups every 15 minutes since the WAL files have everything for PITR, however I do not understand exactly how do you know if your "backup chain" is complete...

If I had barman backup running each 24h but I stopped barman for a few hours, and Postgres deleted some wal files, what happens when I restart barman? Will it warn me or let me know about the issue somehow?

If I understand it correctly, wal_keep_segments=150, would always keep 150 wal files just in case, but if postgres needed to create (eg) 200 WAL files it wouldn't keep the extra 50 right?

Fix barman status output

Fix the barman status command output:

  1. Sometimes Last archived WAL appears twice
# barman status main
Server main:
    Description: main PostgreSQL Database
    Active: True
    Disabled: False
    PostgreSQL version: 9.3.10
    pgespresso extension: Not available
    PostgreSQL Data directory: /pgdata
    PostgreSQL 'archive_command' setting: rsync -a %p barman@backup:/main/incoming/%f
    Last archived WAL: 00000003.history
    Current WAL segment: 00000004000038E0000000A5
    Retention policies: enforced (mode: auto, retention: REDUNDANCY 1, WAL retention: MAIN)
    No. of available backups: 1
    First available backup: 20160120T000004
    Last available backup: 20160120T000004
    Minimum redundancy requirements: satisfied (1/1)
    Last archived WAL: 00000003.history
  1. Last archived wal could be inaccurate, not showing the latest archived WAL but an older file.

EXCEPTION: unable to prepare tablespace (destination ''): ln command failed

Hello,

We are trying to prepare PostgreSQL 9.5 as a new production environment. But we are encountering issues with barman, which we don't get on PostgreSQL 9.4.

After every backup we automatically restore the database and run some tests on it. Because we are running multiple database instances, all installed via puppet, we need to relink the tablespaces. (else multiple databases would try to deploy in the same directories).
To enable this we use the '--tablespace' command: --tablespace=audit_null:/pgdata/dwhdevonhw/audit/null

This seems to work correctly, because this is what we see when restoring for said tablespace:
16422, audit_null, /pgdata/dwhdevonhw/audit/null

The restore creates the correct dir, and restores the data correctly:
$ ls -la /pgdata/dwhdevonhw/audit/null/
total 0
drwx------. 3 barman barman 37 Jun 27 16:41 .
drwxrwxr-x. 7 barman barman 87 Sep 27 12:52 ..
drwx------. 3 barman barman 26 Jun 27 16:42 PG_9.5_201510051

But the relinking didn't happen:
ls -la /mnt/data/restoretests/dwhdevonhw/pg_tblspc/16422
lrwxrwxrwx. 1 barman barman 18 Sep 27 13:45 /mnt/data/restoretests/dwhdevonhw/pg_tblspc/16422 -> /pgdata/audit/null

The odd thing is that this relinking does work on PostgreSQL 9.4

Anyone has an idea what is going wrong?

Bert

fail to create backup label

Barman 1.6.0-1.pgdg14.04+1 fails to create file data/backup_label:

2016-04-20 16:25:54,556 [3939] barman.backup INFO: Starting backup for server standby in /var/lib/barman/standby/base/20160420T162554
2016-04-20 16:25:54,584 [3939] barman.backup_executor INFO:     1471071, tb_hdd_main, /var/lib/postgresql/tb_hdd
2016-04-20 16:30:24,061 [3939] barman.backup_executor INFO: Backup start at xlog location: 3B8/11410FC0 (00000001000003B800000011, 00410FC0)
2016-04-20 16:30:24,061 [3939] barman.backup_executor INFO: This is the first backup for server standby
2016-04-20 16:30:24,071 [3939] barman.backup_executor INFO: Copying files.
2016-04-20 16:30:24,072 [3939] barman.command_wrappers INFO: Smart copy: ':/var/lib/postgresql/tb_hdd/' -> '/var/lib/barman/standby/base/20160420T162554/147171' (ref: None, safe before None)
2016-04-20 16:30:24,072 [3939] barman.command_wrappers INFO: Smart copy step 1/4: preparation
2016-04-20 16:30:24,395 [3939] barman.command_wrappers INFO: Smart copy step 2/4: create directories and delete/copy unknown files
2016-04-20 16:30:24,559 [3939] barman.command_wrappers INFO: Smart copy step 3/4: safe copy
2016-04-20 16:34:17,863 [3939] barman.command_wrappers INFO: Smart copy finished: :/var/lib/postgresql/tb_hdd/ -> /var/lib/barman/standby/base/20160420T16255/1471071 (safe before None)
2016-04-20 16:34:17,883 [3939] barman.backup ERROR: Backup failed writing backup label.
DETAILS: [Errno 2] No such file or directory: '/var/lib/barman/standby/base/20160420T162554/data/backup_label'

Looks like mkdir /var/lib/barman/standby/base/20160420T162554/data would fix that.

Check PostgreSQL 10 versioning scheme

Starting with PostgreSQL 10 the PostgreSQL conversioning scheme will switch from 3 to 2 component.

Check the Barman code for incorrect assumptions on PostgreSQL version string format.

Test fails with latest pytest-catchlog 1.2.0

The build is OK with pytest-catchlog==1.1 (built 8 days ago).
But it fails with latest pytest-catchlog==1.2.0

$ tox -e py34                                                                                         
GLOB sdist-make: /srv/proj/barman/setup.py
py34 recreate: /srv/proj/barman/.tox/py34
py34 installdeps: pytest, mock, pytest-catchlog, pytest-timeout
py34 inst: /srv/proj/barman/.tox/dist/barman-1.5.1b1.zip
py34 installed: argcomplete==1.0.0,argh==0.26.1,barman==1.5.1b1,mock==1.3.0,pbr==1.8.1,psycopg2==2.6.1,py==1.4.30,pytest==2.8.2,pytest-catch
log==1.2.0,pytest-timeout==0.5,python-dateutil==2.4.2,six==1.10.0,wheel==0.24.0
py34 runtests: PYTHONHASHSEED='1069482987'
py34 runtests: commands[0] | py.test tests
=========================================================== test session starts ============================================================
platform linux -- Python 3.4.3+, pytest-2.8.2, py-1.4.30, pluggy-0.3.1
rootdir: /srv/proj/barman, inifile: 
plugins: catchlog-1.2.0, timeout-0.5
collected 265 items 

...

========================================= 2 failed, 263 passed, 66 pytest-warnings in 1.72 seconds =========================================
ERROR: InvocationError: '/srv/proj/barman/.tox/py34/bin/py.test tests'
_________________________________________________________________ summary __________________________________________________________________
ERROR:   py34: commands failed

More a recommendation than an issue...

I've noticed that within the "barman check " output you have a "failed backups" check. However this doesn't seem to calculate correctly, but i think that the problem stems from the barman list-backup and the fact that a backup that loses it PID, never fails.

See backup:
[barman@em-vus-pgbuilder ~]$ barman list-backup ros_management
ros_management 20160531T165506 - STARTED

Even though i deliberately killed the PID here (I'm trying to write some backup failure alerts), the barman backup status remains as "started" indefinite and never turns to a "failed" state. I assume that this is because there's no longer the communication, but I'd expect a timeout or perhaps some kind of PID failure detection?

Thanks and keep up the great work

Problem with archive_mode = always setting in PG 9.5

Barman does not work with archive_mode = always setting. It is needed to archive WAL for cascading slave or warm standby setting from a slave.

I had the following message with this setting:

2016-03-08 05:56:31 UTC [2181-1] postgres@postgres ERROR:  invalid input syntax for type boolean: "always"
2016-03-08 05:56:31 UTC [2181-2] postgres@postgres STATEMENT:  SELECT *, current_setting('archive_mode')::BOOLEAN AND (last_failed_wal IS NULL OR last_failed_wal LIKE '%.history' AND substring(last_failed_wal from 1 for 8) <= substring(last_archived_wal from 1 for 8) OR last_failed_wal <= last_archived_wal) AS is_archiving, CAST (archived_count AS NUMERIC) / EXTRACT (EPOCH FROM age(now(), stats_reset)) AS current_archived_wals_per_second FROM pg_stat_archiver
2016-03-08 05:56:31 UTC [2181-3] postgres@postgres ERROR:  current transaction is aborted, commands ignored until end of transaction block
2016-03-08 05:56:31 UTC [2181-4] postgres@postgres STATEMENT:  SELECT count(*) FROM pg_extension WHERE extname = 'pgespresso'

This is because trying to convert archive_mode into boolean. Archive_mode changed from boolean to enum at PG 9.5.

I'd suggest to correct postgres.py, line from 380 to 391, as well as its caller.

New Wal check in 1.6.1 breaking new installations.

I'm automating installation using a chef cookbook, and this unexpectedly breaks working backup functionality when upgrading from 1.6.0 to 1.6.1.

To be clear, barman show-server, barman check, and barman backup would all execute as expected on version 1.6.0. Now on 1.6.1, I cannot get past this check. The postgres server settings are as documented.

On server to be backed up:

postgres=# show wal_level;
  wal_level
-------------
 hot_standby
(1 row)

postgres=# show archive_mode;
 archive_mode
--------------
 on
(1 row)

postgres=# show archive_command;
                           archive_command
----------------------------------------------------------------------
 rsync -a %p [email protected]:/var/lib/barman/master/incoming/%f
(1 row)

From barman show-server:

archive_command: rsync -a %p [email protected]:/var/lib/barman/master/incoming/%f
incoming_wals_directory: /var/lib/barman/master/incoming

I also found this thread:
https://groups.google.com/forum/#!topic/pgbarman/M-eFUCA1nHA

The barman check command still fails even after running:

barman switch-xlog --force <server>
barman cron

Barman list-files shows only one timeline

Hi,
is it intended behavior, that barman list-files only shows one timeline?

I have a master/slave postgres cluster running and thus after a failover the timeline changes. The wal shipping is working fine and is processed by barman cron correctly. However

barman list-files --target wal fsmpostgres 20160223T070002

lists just the wal files from the previos timeline, e. g.

/var/lib/barman/fsmpostgres/wals/0000000200000021/000000020000002100000089
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008A
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008B
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008C
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008D
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008E
/var/lib/barman/fsmpostgres/wals/0000000200000021/00000002000000210000008F
/var/lib/barman/fsmpostgres/wals/00000003.history

The existing directory with new timelines /var/lib/barman/fsmpostgres/wals/0000000300000021 is obviously being ignored. I tried to rebuild the xlog, but this didn't change anything.

As a result the icinga check for monitoring the last-wal is failing, because wal files of the new timeline are not taken into consideration.

show-backup failed on compressed WAL history files

WAL timeline history file is also compressed (it is intentional?), but show-backup doesn't decompress it:

$ barman show-backup servername 20161106T152222
EXCEPTION: /backup/barman/servername/wals/00000004.history

$ mv /backup/barman/servername/wals/00000004.history /backup/barman/servername/wals/00000004.history.gz        
$ gunzip /backup/barman/servername/wals/00000004.history.gz

$ barman show-backup servername 20161106T152222
Backup 20161106T152222:
Server Name            : servername
Status                 : DONE
PostgreSQL Version     : 90409
...

log contents:

2016-11-07 00:39:41,205 [5707] barman.cli ERROR: /backup/barman/servername/wals/00000004.history
See log file for more details.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/barman/cli.py", line 1022, in main
    p.dispatch(pre_call=global_config)
  File "/usr/lib/python2.7/site-packages/argh/helpers.py", line 55, in dispatch
    return dispatch(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/usr/lib/python2.7/site-packages/argh/dispatching.py", line 231, in _call
    result = function(namespace_obj)
  File "/usr/lib/python2.7/site-packages/barman/cli.py", line 539, in show_backup
    server.show_backup(backup_info)
  File "/usr/lib/python2.7/site-packages/barman/server.py", line 1687, in show_backup
    backup_ext_info = self.get_backup_ext_info(backup_info)
  File "/usr/lib/python2.7/site-packages/barman/server.py", line 1673, in get_backup_ext_info
    forked_after=backup_info.end_xlog)
  File "/usr/lib/python2.7/site-packages/barman/server.py", line 1822, in get_children_timelines
    history_info = xlog.decode_history_file(history_path)
  File "/usr/lib/python2.7/site-packages/barman/xlog.py", line 354, in decode_history_file
    raise BadHistoryFileContents(path)

BadHistoryFileContents: /backup/barman/servername/wals/00000004.history

streaming_archiver and streaming_conninfo not working

Hi,

I am using 1.6.0a1 to get the pg_receivexlog working with postgresql 9.3 on centos7. I have my configuration attached. I manually attempted the pg_receivexlog as the barman user with success. I can 'barman check server0' - I see a successful connection in the postgres log and the output shows: "pg_receivexlog: OK" and "pg_receivexlog compatible: OK".

'barman cron' does not make a connection, though it outputs: "Starting WAL archiving for server server0".

Thanks,
-dkw

config.txt

diagnose.txt

barman archiver error

barman archiver error
barman check output :

Server pg92:
PostgreSQL: OK
superuser: OK
PostgreSQL streaming: OK
wal_level: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 2 backups, expected at least 2)
pg_basebackup: OK
pg_basebackup compatible: OK
pg_basebackup supports tablespaces mapping: OK (pg_basebackup can be used as long as tablespaces support is not required)
archive_mode: OK
archive_command: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archiver errors: FAILED (unknown: 1)

disanose info :

 "pg92": {
        "backups": {
            "20161008T121110": {
                "backup_id": "20161008T121110",
                "backup_label": null,
                "begin_offset": 32,
                "begin_time": "Sat Oct  8 08:10:47 2016",
                "begin_wal": "000000010000000A000000E3",
                "begin_xlog": "A/E3000020",
                "config_file": "/s2/postgres/data/postgresql.conf",
                "deduplicated_size": 1890979321,
                "end_offset": 16777216,
                "end_time": "Sat Oct  8 16:10:43 2016",
                "end_wal": "000000010000000A000000E3",
                "end_xlog": "A/E4000000",
                "error": null,
                "hba_file": "/s2/postgres/data/pg_hba.conf",
                "ident_file": "/s2/postgres/data/pg_ident.conf",
                "included_files": null,
                "mode": "postgres",
                "pgdata": "/s2/postgres/data",
                "server_name": "pg92",
                "size": 1890979321,
                "status": "DONE",
                "tablespaces": null,
                "timeline": 1,
                "version": 90203
            },
            "20161012T105632": {
                "backup_id": "20161012T105632",
                "backup_label": null,
                "begin_offset": 32,
                "begin_time": "Wed Oct 12 02:56:35 2016",
                "begin_wal": "000000010000000A000000ED",
                "begin_xlog": "A/ED000020",
                "config_file": "/s2/postgres/data/postgresql.conf",
                "deduplicated_size": 1895742096,
                "end_offset": 16777216,
                "end_time": "Wed Oct 12 10:56:30 2016",
                "end_wal": "000000010000000A000000ED",
                "end_xlog": "A/EE000000",
                "error": null,
                "hba_file": "/s2/postgres/data/pg_hba.conf",
                "ident_file": "/s2/postgres/data/pg_ident.conf",
                "included_files": null,
                "mode": "postgres",
                "pgdata": "/s2/postgres/data",
                "server_name": "pg92",
                "size": 1895742096,
                "status": "DONE",
                "tablespaces": null,
                "timeline": 1,
                "version": 90203
            }
        },
        "config": {
            "active": true,
            "archiver": true,
            "archiver_batch_size": 0,
            "backup_directory": "/backdisk/barman/pg92",
            "backup_method": "postgres",
            "backup_options": "concurrent_backup",
            "bandwidth_limit": null,
            "barman_home": "/backdisk/barman",
            "barman_lock_directory": "/backdisk/barman",
            "basebackup_retry_sleep": 300,
            "basebackup_retry_times": 3,
            "basebackups_directory": "/backdisk/barman/pg92/base",
            "check_timeout": 30,
            "compression": null,
            "conninfo": "host=192.168.92.236 port=5432 user=lhs dbname=postgres password=pgpass",
            "custom_compression_filter": null,
            "custom_decompression_filter": null,
            "description": "pg92 Postgresql Database (Streaming-Only)",
            "disabled": false,
            "errors_directory": "/backdisk/barman/pg92/errors",
            "immediate_checkpoint": false,
            "incoming_wals_directory": "/backdisk/barman/pg92/incoming",
            "last_backup_maximum_age": null,
            "minimum_redundancy": 2,
            "msg_list": [],
            "name": "pg92",
            "network_compression": false,
            "path_prefix": "/usr/lib/postgresql/9.2/bin",
            "post_archive_retry_script": null,
            "post_archive_script": null,
            "post_backup_retry_script": null,
            "post_backup_script": "/backdisk/barman_script/post_backup_script.sh",
            "pre_archive_retry_script": null,
            "pre_archive_script": null,
            "pre_backup_retry_script": null,
            "pre_backup_script": null,
            "recovery_options": "",
            "retention_policy": "window 7 w",
            "retention_policy_mode": "auto",
            "reuse_backup": null,
            "slot_name": null,
            "ssh_command": null,
            "streaming_archiver": true,
            "streaming_archiver_batch_size": 0,
            "streaming_archiver_name": "barman_receive_wal",
            "streaming_backup_name": "barman_streaming_backup",
            "streaming_conninfo": "host=192.168.92.236 port=5432 user=lhs dbname=postgres password=pgpass",
            "streaming_wals_directory": "/backdisk/barman/pg92/streaming",
            "tablespace_bandwidth_limit": null,
            "wal_retention_policy": "simple-wal 7 w",
            "wals_directory": "/backdisk/barman/pg92/wals"
        },
        "status": {
            "archive_command": "test ! -f /s2/postgres/archive/%f && cp %p /s2/postgres/archive/%f",
            "archive_mode": "on",
            "config_file": "/s2/postgres/data/postgresql.conf",
            "connection_error": null,
            "current_size": 1898866308.0,
            "current_xlog": "000000010000000A000000F9",
            "data_directory": "/s2/postgres/data",
            "hba_file": "/s2/postgres/data/pg_hba.conf",
            "ident_file": "/s2/postgres/data/pg_ident.conf",
            "is_superuser": true,
            "pg_basebackup_bwlimit": false,
            "pg_basebackup_compatible": true,
            "pg_basebackup_installed": true,
            "pg_basebackup_path": "/usr/lib/postgresql/9.2/bin/pg_basebackup",
            "pg_basebackup_tbls_mapping": false,
            "pg_basebackup_version": "9.2.18",
            "pg_receivexlog_compatible": true,
            "pg_receivexlog_installed": true,
            "pg_receivexlog_path": "/usr/lib/postgresql/9.2/bin/pg_receivexlog",
            "pg_receivexlog_supports_slots": false,
            "pg_receivexlog_synchronous": false,
            "pg_receivexlog_version": "9.2.18",
            "pgespresso_installed": false,
            "replication_slot": null,
            "replication_slot_support": false,
            "server_txt_version": "9.2.3",
            "streaming": true,
            "streaming_supported": true,
            "synchronous_standby_names": [
                ""
            ],
            "systemid": "5851801377755352795",
            "timeline": 1,
            "wal_level": "hot_standby",
            "xlogpos": "A/F92DE440"
        }
    },

export and import feature.

I know barman has list-files, which sort of almost works for exporting (it doesn't list empty dirs, so they don't get exported). But there is no way to import a backup back into barman.

It would be awesome if there was an easy way to export and import backups in/out of barman.
WHY?
Maybe we want to keep yearly backups, or for some business purpose we need a snapshot of the DB as of some particular day. Currently to do that you have to turn off all the retention policy stuff and manually delete backups.

it would be awesome if aside from retention policy you can either mark a backup 'special'(so it won't be auto-deleted from retention policy) or a much more generic solution, is just allow export and import of backups (and then you could dump them into S3 or Amazon Glacier or whatever and keep them forever if desired).

Different behaviour from pg_basebackup

Following issue #65, make backup copy process via rsync identical to pg_basebackup by excluding files like:

  • "pgsql_tmp*"
  • "postgresql.auto.conf.tmp"
  • "postmaster.pid"
  • "postmaster.opts"
  • "pg_stat_tmp/*"
  • "pg_replslot/*"
  • "pg_dynshmem/*"
  • "pg_notify/*"
  • "pg_serial/*"
  • "pg_subtrans/*"

Barman error folder

Hi.

I have 6 files in barman error folder.

  • 00000003.history.20160705T220101Z.unknown
  • 00000003.history.20160705T223501Z.unknown
  • 00000003.history.20160705T225402Z.unknown
  • 00000003.history.20160711T051405Z.unknown
  • 00000003.history.20160711T111819Z.unknown
  • 00000003.history.20160711T112704Z.unknown

All of them have similar content. Example vi 00000003.history.20160705T220101Z.unknown
has content "1 F9F/2861E770 before 2016-07-05 15:46:06.743379+03"

Barman check gives result
archiver errors: FAILED (unknown: 6)

Where should i start looking for error? pg 9.5.3 barman 1.6.1

Thanks,
Indrek

WALs are removed from the archive if compression is activated

Hi guys,

While testing beta version of the 1.6.0 release I've found that WALs are removed from the archive if WAL compression is activated (independently from the chosen algorithm) if they are shipped through a streaming connection.

I attach here what Barman log reported in the case pigz was chosen as compression algorithm:

2016-02-08 10:06:01,564 [26066] barman.wal_archiver INFO: Archiving master/00000001000000000000005C 2016-02-08 10:06:01,565 [26066] barman.command_wrappers DEBUG: Command: 'command(){ pigz -c > "$2" < "$1";}; command \'/srv/master/incoming/00000001000000000000005C\' \'/srv/master/wals/0000000100000000/00000001000000000000005C.tmp\'' 2016-02-08 10:06:02,720 [26066] barman.command_wrappers DEBUG: Command return code: 0 2016-02-08 10:06:02,721 [26066] barman.command_wrappers DEBUG: Command stdout: 2016-02-08 10:06:02,721 [26066] barman.command_wrappers DEBUG: Command stderr: 2016-02-08 10:06:02,733 [26066] barman.wal_archiver INFO: Archiving master/00000001000000000000005C 2016-02-08 10:06:02,733 [26066] barman.command_wrappers DEBUG: Command: 'command(){ pigz -c -d > "$2" < "$1" && rm -f "$1";}; command \'/srv/master/wals/0000000100000000/00000001000000000000005C\' \'/srv/master/wals/0000000100000000/00000001000000000000005C.uncompressed\'' 2016-02-08 10:06:02,844 [26066] barman.command_wrappers DEBUG: Command return code: 0 2016-02-08 10:06:02,845 [26066] barman.command_wrappers DEBUG: Command stdout: 2016-02-08 10:06:02,845 [26066] barman.command_wrappers DEBUG: Command stderr:

It looks like Barman compress the WAL, then decompress and remove it, each time a WAL is archived: we are investigating about the issue.

Improve log messages in case of trashed WAL files

When WAL files are obsolete for some reason barman emits the following log message:

 barman.backup INFO: Older than first backup. Trashing file <filename> from <servername>

this doesn't give enough information on the real reason of the trashing.

Investigate on possible improvements

Never removed files from $PGDATA/pg_xlog/archive_status/*.done

Got issue with a lot of old files in /var/lib/postgresql//main/pg_xlog/archive_status/.done (>100k)

"archive_command": "rsync -a --remove-source-files %p barman@server:/var/lib/barman/client/incoming/%f",

  1. Whether barman uses this files? If yes, when?
  2. Should I remove this files manually? (eg. post_backup_script)
  3. Why barman doesn't do this itself?

Incremental backups

Hello.
I would like to discuss page-level incremental backups.
I’ve created proof-of-concept fork of barman here
There is no docs and unit-tests right now, but this will be fixed in near future.

Motivation:
We have large number of databases with pgdata size about 3 terabytes and changes about 1% of data per 24h.
Unfortunately barman backups with hardlinks gives us about 45% deduplication ratio (there are small changes in many data-files, so many data-files changes between backups, but page changed ratio is about 2%)

Solution to this problem seems simple: take only changed pages to backup.
I’ve created simple script named barman-incr (it is in bin dir of source code). It handles backup and restore operations. Barman runs it on database host and passes LSN, timestamp and list of files from previous backup. Then we just open each datafile and read every page in it (if it turns out that file we opened is not datafile, we’ll take it all). If page is lsn >= provided lsn we take this page to backup.

Some tests:
Database with pgdata size 2.7T, 120G wals per 24h.
Full backup size is 537G (compressed with gzip -3), time to take backup - 7h.
Incremental backup size is 14G (also compressed with gzip -3), time to take backup - 30m.

I’ve also tested restore consistency (restored database to some point of time and compared pg_dump result with paused replica).

Block change tracking (Oracle DBAs should be familiar with this, here is white paper about this) implementation will require some changes in wal archiving process. I’ll present some thoughts and test results on this in Q1 2016.

EXCEPTION: 'str' object has no attribute 'backup_id'

Trying to remove a backup from a server, barman exit with the following exception:
EXCEPTION: 'str' object has no attribute 'backup_id'

How to replicate the error:

  • Perform a full backup on a server
  • Start a new backup on the same server
  • Try to remove a backup the first backup while the backup is still running

Doing backup-recover on the same destination directory rewrite the WALs

I ran the same backup recover command again, after 30 minutes, and I noticed that the rsync command transfer only the changed files, except for the pg_xlog/ content, which is purged and rewritten on the destination server.

The recovery could be faster, if we preserved the WALs when launching the command again.

Maybe the pg_xlog content could be appended to the exclude_and_protect variable for rsync.

Backup fails when using password with .pgpass for authentication

To provide a more secure setup we changed from the trust authentication to password based authentication for our backup user using a .pgpass file to store the credentials. This apparently isn't recognized by barman, or at least I have not configured barman to take advantage of this.

Running barman check dbhost returns an error:

Server dbhost:
        ssh: OK
        PostgreSQL: FAILED
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        compression settings: OK
        minimum redundancy requirements: OK (have 1 backups, expected at least 1)

barman backup dbhost states:

Starting backup for server dbhost in /home/barman/dbhost/base/20160822T162300
ERROR: Backup failed issuing start backup command.
DETAILS: Cannot connect to postgres: fe_sendauth: no password supplied

The documentation indicates support for using trust, but is password based authentication not at all supported?

"barman cron" doesn't understand configuration file path

I use a custom path for configuration file. Everything works (backup, list-servers, ..) except the cron feature

$ barman -v
1.5.1

$ barman -c ~/etc/barman.conf list-server
# ok 

$ barman -c ~/etc/barman.conf backup all
# fine, too

$ barman -c ~/etc/barman.conf cron
Starting WAL archiving for server example.net

$ Could not find any configuration file at default locations.
Check Barman's documentation for more help

It seems barman cron starts a sub process which doesn't correctly handle configuration file detection; As in the screenshot, the error message comes after the shell prompts (the last $)

Barman 1.5.0 fails against PostgreSQL 8.4

Barman 1.5.0 does not work correctly with PostgreSQL 8.4, due to a check of the wal_level parameter.
The parameter does not exists on PostgreSQL 8.4.

Manage PostgreSQL checks differently, always considering older versions.

Correct backup postgres via streaming in kubernetes

Hello does it give a complete example setup for postgres and barman to backup?

We have kubernetes in use and would like to backup our postgres with a sidecar container with barman. Have someone hints or examples for us?

Regards, Josef

EDIT: we use a postgres 9.5

unexpected EOF on client connection with an open transaction

Hi,

I don't really know if it is an issue regarding barman, but each time we run a full backup, we get the following in the logs at the end:

2016-06-26 06:31:43 CEST [10475-1] postgres@postgres LOG: restore point "barman_20160626T040001" created at 1CC/2C0000C8
2016-06-26 06:31:43 CEST [10475-2] postgres@postgres STATEMENT: SELECT pg_create_restore_point('barman_20160626T040001')
2016-06-26 06:31:43 CEST [10475-3] postgres@postgres LOG: could not receive data from client: Connection reset by peer
2016-06-26 06:31:43 CEST [10475-4] postgres@postgres LOG: unexpected EOF on client connection with an open transaction

Wrong calculation of backup size

In case of tablespaces, Barman 1.5.0 does not correctly calculate the backup size. Please review the backup_fsync_and_set_sizes() function.

Thanks to Matthew Oldham for pointing it out.

EXCEPTION: [Errno 13] Permission denied: '/tmp/barman_recovery-aYBKVi/postgresql.conf

when I try to recover backup i get following error:

[root@barman-qa-01 ~]# barman recover --remote-ssh-command 'ssh postgres@barman-qa-02' test-01 20161107T124401 /var/lib/pgsql/9.5/data/
Starting remote restore for server test-01 using backup 20161107T124401
Destination directory: /var/lib/pgsql/9.5/data/
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
EXCEPTION: [Errno 13] Permission denied: '/tmp/barman_recovery-aYBKVi/postgresql.conf'
See log file for more details.

barman.log:

2016-11-08 15:38:30,903 [6715] barman.command_wrappers DEBUG: Command: "ssh postgres@barman-qa-02 'test -d /var/lib/pgsql/9.5/data/pg_xlog/archive_status'"
2016-11-08 15:38:31,062 [6715] barman.command_wrappers DEBUG: Command return code: 0
2016-11-08 15:38:31,063 [6715] barman.command_wrappers DEBUG: Command stdout:
2016-11-08 15:38:31,063 [6715] barman.command_wrappers DEBUG: Command stderr:
2016-11-08 15:38:31,065 [6715] barman.recovery_executor INFO: Identify dangerous settings in destination directory.
2016-11-08 15:38:31,067 [6715] barman.cli ERROR: [Errno 13] Permission denied: '/tmp/barman_recovery-Yd5gHN/postgresql.conf'
See log file for more details.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/barman/cli.py", line 1022, in main
    p.dispatch(pre_call=global_config)
  File "/usr/lib/python2.6/site-packages/argh/helpers.py", line 47, in dispatch
    return dispatch(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 121, in dispatch
    for line in lines:
  File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 197, in _execute_command
    for line in result:
  File "/usr/lib/python2.6/site-packages/argh/dispatching.py", line 153, in _call
    result = args.function(args)
  File "/usr/lib/python2.6/site-packages/barman/cli.py", line 411, in recover
    remote_command=args.remote_ssh_command)
  File "/usr/lib/python2.6/site-packages/barman/server.py", line 1164, in recover
    target_xid, target_name, exclusive, remote_command)
  File "/usr/lib/python2.6/site-packages/barman/backup.py", line 445, in recover
    exclusive, remote_command)
  File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 229, in recover
    self._analyse_temporary_config_files(recovery_info)
  File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 884, in _analyse_temporary_config_files
    "%s.origin" % conf_file)
  File "/usr/lib/python2.6/site-packages/barman/recovery_executor.py", line 946, in _pg_config_mangle
    with open(filename, 'w') as f:
IOError: [Errno 13] Permission denied: '/tmp/barman_recovery-Yd5gHN/postgresql.conf'

file:

[root@barman-qa-01 barman_recovery-Yd5gHN]# ls -la /tmp/barman_recovery-Yd5gHN/postgresql.conf
-r--r--r-- 1 barman barman 758 Nov  7 12:44 /tmp/barman_recovery-Yd5gHN/postgresql.conf

my system:

[root@barman-qa-01 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@barman-qa-01 ~]# python --version
Python 2.6.6
[root@barman-qa-01 ~]# rpm -aq *barman*
barman-2.0-1.rhel6.noarch

get-wal tries to decompress a WAL file partially received

We've configured barman-wal-restore for a standby server (loosely coupled to the production system).

We noticed some tracebacks in barman.log because get-wal tries to decompress a WAL file which is not complete.

2015-10-25 12:22:03,265 [14064] barman.cli ERROR: {'err': u'\ngzip: stdin: unexpected end of file\n', 'ret': 1, 'out': u''}
See log file for more details.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 828, in main
    p.dispatch(pre_call=global_config)
  File "/usr/lib/python2.7/dist-packages/argh/helpers.py", line 53, in dispatch
    return dispatch(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 124, in dispatch
    for line in lines:
  File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 200, in _execute_command
    for line in result:
  File "/usr/lib/python2.7/dist-packages/argh/dispatching.py", line 156, in _call
    result = args.function(args)
  File "/usr/lib/python2.7/dist-packages/barman/cli.py", line 530, in get_wal
    output_directory=output_directory)
  File "/usr/lib/python2.7/dist-packages/barman/server.py", line 1180, in get_wal
    wal_compressor.decompress(source_file, uncompressed_file.name)
  File "/usr/lib/python2.7/dist-packages/barman/command_wrappers.py", line 118, in __call__
    self.getoutput(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/barman/command_wrappers.py", line 158, in getoutput
    ret=self.ret, out=self.out, err=self.err))
CommandFailedException: {'err': u'\ngzip: stdin: unexpected end of file\n', 'ret': 1, 'out': u''}

The partial log files should be saved with a .tmp extension and mv to their final location once they are compressed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.