Comments (3)
Hi!
Yes, wal-g will return non-zero return code upon failure. And Postgres will retry archival of the file. Please check:
- what is in pg_stat_archive
- change AWS endpoint and run your archive manually. Does it return non-zero exit code
- your restore_command. The error in log that you posted is not from wal-fetch, it's from streaming replication
- contents of pg_wal/archive_status. There are files .ready and .done for files ready for archivation and done with archivation.
from wal-g.
Hi, thanks for quick reply.
- what is in pg_stat_archive
Seems, no errors?
archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
----------------+--------------------------+-------------------------------+--------------+-----------------+------------------+-------------------------------
21949 | 000000080000B7EB00000004 | 2023-10-12 09:54:30.988245+00 | 0 | | | 2023-10-10 12:05:28.317102+00
- change AWS endpoint and run your archive manually. Does it return non-zero exit code
That was an accidental timeout error. There are lots of WALs before and after missing one, that ran successfully. I'm not sure how to reproduce a connection timeout situation. I've tried sending another existing WAL to non-existing host, and tried to send not-existing WAL. In both cases I got exit code 1, so it seems to be correct. I'm not sure how to reproduce the very same timeout error in order to check the exact situation.
your restore_command. The error in log that you posted is not from wal-fetch, it's from streaming replication
The problem here is not with restore. File has not been uploaded to S3 by wal-push
command. I've already gone to storage and checked this out. Nevertheless, here are restore commands:
# recovery.conf
restore_command = '/usr/local/scripts/walg-fetch.sh "%f" "%p"'
archive_cleanup_command = 'pg_archivecleanup /srv/postgresql/9.6/main/pg_xlog "%r"'
# /usr/local/scripts/walg-fetch.sh
set -o noclobber # Avoid overlay files (echo "hi" > foo)
set -o errexit # Used to exit upon error, avoiding cascading errors
set -o nounset # Exposes unset variables
set -o pipefail # Unveils hidden failures
set -o allexport
source /etc/default/wal-g
set +o allexport
/usr/local/bin/wal-g wal-fetch $1 $2
contents of pg_wal/archive_status. There are files .ready and .done for files ready for archivation and done with archivation.
The database is highly loaded. I've also checked, that there is no such file on master server in both pg_xlog
and pg_xlog/archive_status
directory now. Neigther ready or done. The bug in wal-push occured yesterday, as it has not been expected, It has not been noticed. I've found it only today, while restoring from backup. But the fact, that the file has been removed from both pg_xlog
and archive_status
, may mean that postgres "thinks" it has been uploaded successfully.
from wal-g.
Got a very similar error while doing a backup with backup-push
command. It is wierd, there were no retries, but I have WALG_S3_MAX_RETRIES=5
Log:
ERROR: 2023/10/13 07:15:30.005112 failed to upload 'shard_2/basebackups_005/base_000000080000B806000000A8_D_000000080000B7A700000047/tar_partitions/part_090.tar.lz4' to bucket 'db-backup': MultipartUpload: upload multipart failed
upload id: fiZmvFty6a_GcOq4q4iQ81KBySt3IvMSw_LOa9bjbGodNitPA1Hizb2EAubMRTZgT.puzOTzJBTV5G8ZMJL4fnlvejD0_9yMb1j5DwEGg0irJG257jKTlkhZpiV2jo60
caused by: RequestError: send request failed
caused by: Put "https://db-backup.s3.dualstack.eu-west-1.amazonaws.com/shard_2/basebackups_005/base_000000080000B806000000A8_D_000000080000B7A700000047/tar_partitions/part_090.tar.lz4?partNumber=10&uploadId=fiZmvFty6a_GcOq4q4iQ81KBySt3IvMSw_LOa9bjbGodNitPA1Hizb2EAubMRTZgT.puzOTzJBTV5G8ZMJL4fnlvejD0_9yMb1j5DwEGg0irJG257jKTlkhZpiV2jo60": write tcp 172.16.200.144:49530->52.218.108.216:443: write: connection timed out
ERROR: 2023/10/13 07:15:30.005127 upload: could not upload 'base_000000080000B806000000A8_D_000000080000B7A700000047/tar_partitions/part_090.tar.lz4'
ERROR: 2023/10/13 07:15:30.005144 failed to upload 'shard_2/basebackups_005/base_000000080000B806000000A8_D_000000080000B7A700000047/tar_partitions/part_090.tar.lz4' to bucket 'db-backup': MultipartUpload: upload multipart failed
upload id: fiZmvFty6a_GcOq4q4iQ81KBySt3IvMSw_LOa9bjbGodNitPA1Hizb2EAubMRTZgT.puzOTzJBTV5G8ZMJL4fnlvejD0_9yMb1j5DwEGg0irJG257jKTlkhZpiV2jo60
caused by: RequestError: send request failed
caused by: Put "https://db-backup.s3.dualstack.eu-west-1.amazonaws.com/shard_2/basebackups_005/base_000000080000B806000000A8_D_000000080000B7A700000047/tar_partitions/part_090.tar.lz4?partNumber=10&uploadId=fiZmvFty6a_GcOq4q4iQ81KBySt3IvMSw_LOa9bjbGodNitPA1Hizb2EAubMRTZgT.puzOTzJBTV5G8ZMJL4fnlvejD0_9yMb1j5DwEGg0irJG257jKTlkhZpiV2jo60": write tcp 172.16.200.144:49530->52.218.108.216:443: write: connection timed out
ERROR: 2023/10/13 07:15:30.005154 Unable to continue the backup process because of the loss of a part 90.
from wal-g.
Related Issues (20)
- WAL-G xtrabackup-push is slow HOT 4
- check-ao-aocs-length command does not read config HOT 1
- binlog-server: Error while waiting MySQL applied binlogs HOT 1
- Questions about 3.0 release HOT 10
- Building for Windows HOT 1
- Greenplum: Restoration GP from a backup with the --restore-only option HOT 4
- CVE-2023-39325 - golang: net/http, x/net/http2: rapid stream resets can cause excessive work
- mysql backup-restore should check architecture
- Guidance on configuring WAL-G for MySQL HOT 1
- --after option for delete retain is throwing a panic error
- Greenplum 7 column c.relstorage does not exist HOT 1
- Question: how to reduce number of full backup ? HOT 2
- miss makezero in slice init HOT 1
- Corrupted Restore (Indexes) on Delta with rating composer HOT 8
- Validation causes too many storage calls in v3.0.1 HOT 15
- Unable to build wal-g libsodium donwload issue HOT 4
- Backups intermittently fail to upload due to "request body too small" error (backblaze s3 storage) HOT 2
- link_libsodium.sh - build fails cause of broken link HOT 1
- rewrite this in rust HOT 1
- wal-fetch in WAL-G v3.0.2 does not see WAL files HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wal-g.