Git Product home page Git Product logo

gpbackup-s3-plugin's Introduction

Using the S3 Storage Plugin with gpbackup and gprestore

The S3 plugin lets you use an Amazon Simple Storage Service (Amazon S3) location to store and retrieve backups when you run gpbackup and gprestore.

To use the S3 plugin, you specify the location of the plugin and the AWS login and backup location in a configuration file. When you run gpbackup or gprestore, you specify the configuration file with the option --plugin-config.

If you perform a backup operation with the gpbackup option --plugin-config, you must also specify the --plugin-config option when you restore the backup with gprestore.

The S3 plugin supports both AWS and custom storage servers that implement the S3 interface.

Pre-Requisites

The project requires the Go Programming language version 1.13 or higher. Follow the directions here for installation, usage and configuration instructions.

Downloading

go get github.com/greenplum-db/gpbackup-s3-plugin/...

Building and installing binaries

Switch your current working directory to the above gpbackup_s3_plugin source directory

Build

make build

This will build the gpbackup_s3_plugin binary in $HOME/go/bin.

Install

make install

This will install the gpbackup_s3_plugin binary on all the segments hosts. Note that GPDB must be sourced for this to work.

Test

make test

Runs the unit tests

S3 Storage Plugin Configuration File Format

The configuration file specifies the absolute path to the gpbackup_s3_plugin executable, AWS connection credentials, and S3 location.

The configuration file must be a valid YAML document in the following format:

executablepath: <absolute-path-to-gpbackup_s3_plugin>
options: 
  region: <aws-region>
  endpoint: <s3-endpoint>
  aws_access_key_id: <aws-user-id>
  aws_secret_access_key: <aws-user-id-key>
  bucket: <s3-bucket>
  folder: <s3-location>
  encryption: [on|off]
  http_proxy: <http-proxy>

executablepath is the absolute path to the plugin executable (eg: use the fully expanded path of $GPHOME/bin/gpbackup_s3_plugin).

Below are the s3 plugin options

Option Name Description
region aws region (will be ignored if endpoint is specified
endpoint endpoint to a server implementing the S3 interface
aws_access_key_id AWS S3 ID to access the S3 bucket location that stores backup files
aws_secret_access_key AWS S3 passcode for the S3 ID to access the S3 bucket location
bucket name of the S3 bucket. The bucket must exist with the necessary permissions
folder S3 location for backups. During a backup operation, the plugin creates the S3 location if it does not exist in the S3 bucket.
encryption Enable or disable SSL encryption to connect to S3. Valid values are on and off. On by default
http_proxy your http proxy url
backup_max_concurrent_requests concurrency level for any file's backup request
backup_multipart_chunksize maximum buffer/chunk size for multipart transfers during backup
restore_max_concurrent_requests concurrency level for any file's restore request
restore_multipart_chunksize maximum buffer/chunk size for multipart transfers during restore

Example

This is an example S3 storage plugin configuration file that is used in the next gpbackup example command. The name of the file is s3-test-config.yaml.

executablepath: $GPHOME/bin/gpbackup_s3_plugin
options: 
  region: us-west-2
  aws_access_key_id: test-s3-user
  aws_secret_access_key: asdf1234asdf
  bucket: gpdb-backup
  folder: test/backup3

This gpbackup example backs up the database demo using the S3 storage plugin. The absolute path to the S3 storage plugin configuration file is /home/gpadmin/s3-test.

gpbackup --dbname demo --single-data-file --plugin-config /home/gpadmin/s3-test-config.yaml

The S3 storage plugin writes the backup files to this S3 location in the AWS region us-west-2.

gpdb-backup/test/backup3/backups/YYYYMMDD/YYYYMMDDHHMMSS/

Notes

The S3 storage plugin application must be in the same location on every Greenplum Database host. The configuration file is required only on the coordinator host.

Using Amazon S3 to back up and restore data requires an Amazon AWS account with access to the Amazon S3 bucket. The Amazon S3 bucket permissions required are Upload/Delete for the S3 user ID that uploads the files and Open/Download and View for the S3 user ID that accesses the files.

gpbackup-s3-plugin's People

Contributors

ajr-vmware avatar bmdoil avatar brosander avatar chrishajas avatar gp-releng avatar hughcapet avatar jimmyyih avatar jmcatamney avatar khuddlefish avatar kyeap-vmware avatar roicos avatar shivzone avatar soumyadeep2007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpbackup-s3-plugin's Issues

faild in build s3 plugin

os: "Rocky Linux 8.8 (Green Obsidian)"
go: go version go1.19.10 linux/amd64

echo $LD_PRELOAD
/lib64/libz.so.1
ll /lib64/libz.so.1
lrwxrwxrwx 1 root root 14 May 17  2023 /lib64/libz.so.1 -> libz.so.1.2.11

I did:

git clone https://github.com/greenplum-db/gpbackup-s3-plugin.git
cd gpbackup-s3-plugin/
git checkout 1.10.2
make build

and received:

GO111MODULE=on  go mod download
GO111MODULE=on  go build -o /home/gpadmin/go/bin/gpbackup_s3_plugin -ldflags "-X github.com/greenplum-db/gpbackup-s3-plugin/s3plugin.Version=1.10.2"
# github.com/greenplum-db/gpbackup-s3-plugin
/usr/bin/ld: Relink `/usr/lib64/libbfd-2.30-119.el8.so' with `/lib64/libz.so.1' for IFUNC symbol `crc32_z'

I upgraded zlib, zlib-devel
Also downloaded binutils-devel and binutils

But after that i have the same error:

GO111MODULE=on  go mod download
GO111MODULE=on  go build -o /home/gpadmin/go/bin/gpbackup_s3_plugin -ldflags "-X github.com/greenplum-db/gpbackup-s3-plugin/s3plugin.Version=1.10.2"
# github.com/greenplum-db/gpbackup-s3-plugin
/usr/bin/ld: Relink `/usr/lib64/libbfd-2.30-119.el8.so' with `/lib64/libz.so.1' for IFUNC symbol `crc32_z'

`make build` and also `gpbackup-s3-plugin --version` cause error.

Environment

OS Ver:

#uname
Linux
#uname -r
3.10.0-1160.25.1.el7.x86_64

GO ver:

#go version
go version go1.17.3 linux/amd64

gpbackup-s3-plugin version 1.7.0

Problem

I do build as it described in README.MD:

#echo "$GOPATH" 
/root/go
#go get github.com/greenplum-db/gpbackup/...
#cd $GOPATH/src/github.com/greenplum-db/gpbackup-s3-plugin
# make build
fatal: not a git repository (or any of the parent directories): .git
GO111MODULE=on  go mod download
GO111MODULE=on  go build -o /root/go/bin/gpbackup_s3_plugin -ldflags "-X github.com/greenplum-db/gpbackup-s3-plugin/s3plugin.Version="

But the build process still creates an executable file, but it can't print version.

#$GOPATH/bin/gpbackup_s3_plugin --version
Incorrect Usage. flag provided but not defined: -version

Workaround 0

Clone repo from github and run make from there.

#git clone https://github.com/greenplum-db/gpbackup-s3-plugin.git gpbackup-s3-plugin
#cd gpbackup-s3-plugin
#make build
GO111MODULE=on  go mod download
GO111MODULE=on  go build -o /root/go/bin/gpbackup_s3_plugin -ldflags "-X github.com/greenplum-db/gpbackup-s3-plugin/s3plugin.Version=1.7.0"

And everything is fine:

#$GOPATH/bin/gpbackup_s3_plugin --version
gpbackup_s3_plugin version 1.7.0

Workaround 1

Change one line in Makefile:

GIT_VERSION := $(shell git describe --tags | perl -pe 's/(.*)-([0-9]*)-(g[0-9a-f]*)/\1+dev.\2.\3/')

to

GIT_VERSION=1.7.0

Note

Well I do know not much about golang and that line were appear in Makefile since commit at May 1, 2018, and so maybe I done something wrong. But only source code that I could find after invoking go get command was in "$GOPATH/src/github.com/greenplum-db/gpbackup-s3-plugin" folder. Please tell me how I should have done to get correct result.

Maybe you should break relationship between git and your Makefile. In other projects on github I usually found a little text or source code file, which have contains version number and both package build process and git commit were referenced to this file.

Performance issue with v1.17.0

Having run tests with Greenplum 5.x and 6.x with the latest s3 plugin 1.17.0 I have noticed a huge dropoff in restore performance.

Restores are streaming at a much lower rate and the process is hanging after 20 minutes

I've tried varying the parameters to no avail and run tests with different versions of gpbackup

Rolling back to 1.16.0 seems to fix all the issues

upload failed when `backup_multipart_chunksize` is much smaller than the size of compressed data in each segment

TL; DR

It is very easy to see, that when backup_multipart_chunksize is much smaller than the size of compressed data in each segment, lines like
[DEBUG]:-Https request attempt 0 failed. Next attempt in 49.507618ms. RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

will flood the gpbackup-s3-plugin's log file, and finally will fail after 10 tries.

details

/path/to/gpbakcup-s3-plugin-config.yaml

executablepath: /path/to/gpbackup_s3_plugin
options:
  region: <region>
  aws_access_key_id: <aws_access_key_id>
  aws_secret_access_key: <aws_secret_access_key>
  bucket: <bucket_name>
  folder: <folder_name>
  encryption: off
  backup_multipart_chunksize: 5MB

The target table is called "large_table" (in the database "testdb"), of which each segment has about 2GB of data after being compressed using gzip.

When we try to backup "large_table" and upload the data to s3 using the following command:
gpbackup --dbname testdb --plugin-config /path/to/gpbakcup-s3-plugin-config.yaml --debug --include-table public.large_table

It's easy to observe the behavior articulated in the section "TL;DR", ended up crying out "failed" eventually.

s3 plugin doesn't work with s3-compatible storage

The latest changes from bce4362 break work with s3-compatible storage. The next error happens:

20231208:11:20:27 gpbackup:givi:samsung-ivanov:396025-[CRITICAL]:-exit status 1: 20231208:11:20:27 gpbackup_s3_plugin:givi:samsung-ivanov:396382-[ERROR]:-NoSuchBucket: The specified bucket does not exist.
        status code: 404, request id: b33f5d6b7077be48, host id:

because there is no bucket name in the url.

Dependency download fails with Go version 1.12

This is presumably the same as issue greenplum-db/gpbackup#326

Dependency download fails with Go version 1.12.7

go mod download
warning: pattern "all" matched no module dependencies
go build -tags 'gpbackup_s3_plugin' -o /tmp/s3plugin/bin/gpbackup_s3_plugin -ldflags "-X github.com/greenplum-db/gpbackup-s3-plugin/s3plugin.Version=1.2.0+dev.3.g51d8dbc"
# github.com/greenplum-db/gpbackup-s3-plugin
./gpbackup_s3_plugin.go:15:18: cannot use cli.BoolFlag literal (type cli.BoolFlag) as type cli.Flag in assignment:
	cli.BoolFlag does not implement cli.Flag (Apply method has pointer receiver)
./gpbackup_s3_plugin.go:23:15: cannot use []cli.Command literal (type []cli.Command) as type []*cli.Command in assignment
./gpbackup_s3_plugin.go:66:4: cannot use s3plugin.GetAPIVersion (type func(*cli.Context)) as type cli.ActionFunc in field value
Makefile:42: recipe for target 'build' failed
make: *** [build] Error 2

Dependency download works with Go Version 1.13.4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.