Git Product home page Git Product logo

pg_extractor's Introduction

PGExtractor is both a command line script and python class that can be used to provide more finely detailed filtering options for PostgreSQL's pg_dump program. Each database object is extracted into its own file organized into folders by type. This makes having the database schema as a reference easier and allows for better checking into version control. By default data is not dumped, but is easily done with a single option and can be done as plaintext or pg_dump's custom format. Object filtering can be done directly via command line options or fed in with external text files. Regex pattern matching is also possible.

See --help & --examples for a full list of available options and how to use the script.

The script only uses pg_dump/all to touch the database. pg_restore is only used for generating ddl and does not ever touch the database.

This script natively requires Python 3. The 3to2 script can be used to allow it work work with Python 2.7, but it will not always be guarenteed to work. https://pypi.python.org/pypi/3to2

$ 3to2 -w pg_extractor.py

Python 3 was chosen for its more consistent treatment of plaintext and binary file formats. Since this is a text processing script, that consistency makes development easier and more predictable. Also, Python 3 has been out since 2008 and all major OS distributions have packages available, so I'm doing my small part to help drive adoption to the new major version.

Several of the class methods are public and can be used to inspect a custom format binary dump file or apply some of the editing options.

Python 3.3.1 (default, Sep 25 2013, 19:29:01) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pg_extractor import PGExtractor
>>> p = PGExtractor()
>>> object_list = p.build_main_object_list("dumpfile.pgr")
>>> table_object_list = p.build_type_object_list(object_list, ['TABLE'])
>>> for t in table_object_list:
...     print(t)
... 
{'objname': 'job_detail_p0', 'objid': '238; 1259 596233', 'objowner': 'keith', 'objtype': 'TABLE', 'objschema': 'jobmon'}
{'objname': 'job_detail_p10', 'objid': '239; 1259 596244', 'objowner': 'keith', 'objtype': 'TABLE', 'objschema': 'jobmon'}
...

Remove the password hashes from an existing "pg_dumpall -r" roles file:

>>> p.remove_passwords("pg_dumpall_roles.sql")

New Version 2.x

Version 2.x is a complete rewrite of PG Extractor in python. Most of the configuration options are the same, but many have been changed for clarity, so please check the --help.

Non-compatibilities with 1.x to be aware of when dropping in 2.x to replace it

  • Requires Python 3
  • The "hostname" is no longer a default part of the directory structure created. If this is still desired, set the --hostnamedir option with whatever the existing directory is.
  • Built in version control options are gone. They were rather fragile options and could easily lead to a whole lot of things getting checked into version control that should not have been. I've found it's easier (and safer) to manage version control check-ins separately. If these are really wanted please create an Issue on github and I'll consider it if there's enough interest.
  • Removed --rolesdir option

New features:

  • Full Python 3 class object with public methods that may possibly be useful on existing dump files
  • --jobs option to allow parallel object extraction
  • --remove_passwords option can remove the password hashes from an extracted roles file
  • --getdefaultprivs extracts the default privileges set for any roles that used ALTER DEFAULT PRIVILEGES
  • --delete cleans up empty folders properly
  • --wait option to allow a pause in object extraction. Helps reduce load when data is included in extraction.
  • --temp option to allow setting custom temporary working space
  • Sequences files can now include the statement to set the current value if data is output
  • Better support for when objects have mixed case names or special characters. Special characters in an object name turn into ,hexcode, to allow a valid system filename.
  • Rules & Triggers on views are now always included in the view file itself properly.

The version 1.x series written in perl will no longer be developed. Only bug fixes to the existing code will be accepted.

pg_extractor's People

Contributors

4orbit avatar jobywalker avatar keithf4 avatar mikestankavich avatar olasd avatar xzilla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg_extractor's Issues

FR: allow to run against an existing dump file

it would be great if one could point pg_extractor to an existing dump file as a alternative mode to the current behavior of always connecting to an running PostgreSQL server and doing an fresh dump internally. Think to the inspection of Backup files for an possible usecase.

Problem with recent versions of Python

When running on recent versions of Python we got buffer attribute does not exist on sys.stdout.

$ python3.8 ~/usr/pg_extractor/pg_extractor.py --getall --getsequence --orreplace --hostname . --host dbhost --dbname dbname
Creating temp dump file...
Extracting tables...
Extracting functions & aggregates...
Extracting sequences...
Traceback (most recent call last):
  File "/home/sguarin/usr/pg_extractor/pg_extractor.py", line 1500, in <module>
    p.or_replace()
  File "/home/sguarin/usr/pg_extractor/pg_extractor.py", line 797, in or_replace
    sys.stdout.buffer.write(
AttributeError: '_io.BufferedWriter' object has no attribute 'buffer'

When running the same on python 3.4 all runs succesfully.

Add tests

It would be a thousand times easier to contribute to this codebase if there were tests with end to end coverage.

At the moment, it is near impossible to contribute since it is hard to see the impact of the changes.

Feature Request: Foreign Servers + User Mappings

Kudos on the great work.

I was wondering if there is a way to dump foreign servers and user mappings as well?

I realize the security implications of credentials ending up in version control and that might be reason enough not to include the functionality.

Thanks,
Jeff

help file

This is certainly a minor issue...

The "help" file is generated from the POD material in the Perl script, right? If the in-script material is authoritative, it would probably be better to have a script that generates help using pod2text, that way people won't be tempted to make changes in two places.

Specifying schema include does not work

If I try to only dump objects from a specific schema, I get the following:

dejsi:pg_extractor rrj$ ./pg_extractor.py -U raro30 -d wcm --gettables -n cmpxy
Creating temp dump file...
Traceback (most recent call last):
File "./pg_extractor.py", line 1212, in
p._create_temp_dump()
File "./pg_extractor.py", line 841, in _create_temp_dump
pg_dump_cmd.append(_build_filter_list("csv", self.args.schema_include, " --schema="))
NameError: global name '_build_filter_list' is not defined

orreplace flag doesn't actually work

When I run the new python pg_extractor.py on my database with the "--orreplace" option it doesn't actually try to add "OR REPLACE" to anything. I'm using this as my command:

pg_extractor --host xxxx --port 5432 --username postgres --dbname foo --getall --orreplace --schemadir --delete --basedir sql --nodbnamedir

When I put some debugging into the or_replace method I see only that it is looking in sql/functions but I have directories for each schema and it does not look in those.

PGPORT is ignored

PGHOST=xxx.aivencloud.com
PGPORT=23893
PGUSER=avnadmin
PGPASSWORD=xxx

Gives error:

./src/bin/pg_extractor.py --getall
Creating temp dump file...
Error in pg_dump command while creating template dump file: pg_dump: error: connection to database "avnadmin" failed: could not connect to server: Operation timed out
	Is the server running on host "xxx.aivencloud.com" (xxx) and accepting
	TCP/IP connections on port 5432?
Subprocess command called: ['pg_dump', '--format=custom', '--file=/var/folders/j0/ljpkc6ps1kb6x0ts9fmzmx3h0000gn/T/pg_extractork1des0sh', '--schema-only']

The key here is that it says that "TCP/IP connections on port 5432?" despite PGPORT configured to 23893.

jobs option fails when extracting remaining objects

If I use --jobs like so:

./pg_extractor.py -U raro30 -d wcm -n cmpxy --jobs 4 --gettables --getviews --getfuncs --getschemata --gettypes --schemadir

After a while, I get:

Extracting functions & aggregates...
Extracting remaining objects...
Traceback (most recent call last):
File "./pg_extractor.py", line 1211, in
extracted_files_list = p.create_extract_files(filtered_list)
File "./pg_extractor.py", line 582, in create_extract_files
j.start()
File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/multiprocessing/process.py", line 101, in start
assert self._popen is None, 'cannot start a process twice'
AssertionError: cannot start a process twice

The same with --jobs 2

The -h option can't be used to specify HOST

It seems to be argparse's fault that -h can't be used in a way convenient for a Postgres-related tool.

I think it's worth nudging them to make it possible to keep -h for the caller and only using --help (or -? instead of -h) via an option to ArgumentParser().

crash on line line 152

I get a crash when running 2.0.0rc1 against one of my DBs:

Traceback (most recent call last):
File "./pg_extractor.py", line 1201, in
main_object_list = p.build_main_object_list()
File "./pg_extractor.py", line 152, in build_main_object_list
object_dict = dict([('objid', obj_mapping.group('objid'))
AttributeError: 'NoneType' object has no attribute 'group'

what can I provide to analyze the problem? comparing a --debug output, the leftover dump.pgr file and a manual dump leaves the impressing the processing has reached the end of the dump at that stage.

pg_extractor 2.0.0rc1
homebrewd Python 3.3.5
OSX 10.8.5
Postgres.app 9.3.2.0 with pg_dump 9.3.2 in path

Support for FOREIGN TABLES

When exporting a database with FOREIGN TABLES, there are listed on pg_restore -l but the objects are missing in the ddl files.

Checking python version...

Hi,
I realise one needs python v3+, and I noticed the version check at the top of the script.

However, I get this when I run against python < v3

$ -> which python
/usr/bin/python

$ -> python --version
Python 2.7.5

$ -> pg_extractor-master/pg_extractor.py
File "pg_extractor-master/pg_extractor.py", line 690
print(re.sub(r'^CREATE FUNCTION\b', "CREATE OR REPLACE FUNCTION", line), end="")
^
SyntaxError: invalid syntax

Add support for extensions

Just has the "CREATE EXTENSION name" entry, but could still be useful for keeping track of what's installed.

Extract Schema without Grants

Hi,

we like to generate the scripts for all schema objects of a given database but we have only a user with limited access rights to do the job. One major grant that is missing is the select grant on pg_authid. When we exucte the script with

python pg_extractor.py -N test -x --no_owner --getall --schemadir

we'd expect that we'll get the scripts without grants and roles. Instead we get an error:

Error in pg_dumpall command while extracting roles: pg_dumpall: query failed: Error no access rights for relation pg_authid

Best,
Ingo

files created by pg_extractor need not to be suitable for building db

when extracting a table with a foreign key constraint it is not possible to execute the created files one by one to restore the db because you may loose fk conststraints if the referenced table is not created before.

is it possible to split the created scripts in a way so you can execute them one by one and end up with a working db?

so first sequences, then tables, then primary keys then foreign keys then views and so on.

or is it possible to force postgres to add the constrains invalidated and repair them afterwards (i think oracle has the option)?

Changes in pg_dump/pg_restore 9.6 break RULE extraction

Version info
pg_extractor.py : 2.3.7
postgresql servers : 8.4, 9.4, 9.5 & 9.6
pg_dump : 9.6.1 (until recently 9.5.5)
pg_restore : 9.6.1 (until recently 9.5.5)

First, thanks for a great utility which I've been using successfully for a few years to take nightly snapshots of my databases' schema DDL. Recently I noticed some unexpected changes in the extracted DDL for my rules. They seemed to be switching back and forth between INSERT, UPDATE and DELETE variants.

Further investigation revealed that instead of there being one file per rule in ./rules/, there is only one file per view. So, for example, instead of seeing, as I did in the past:

schema_name.view_name1_update.sql
schema_name.view_name1_insert.sql
schema_name.view_name1_delete.sql

schema_name.view_name2_update.sql
schema_name.view_name2_insert.sql
schema_name.view_name2_delete.sql

I now see

schema_name.view_name1.sql
schema_name.view_name2.sql

The underlying cause seems to be a difference in the information stored in the dump file prepared using pg_dump 9.6.1.

If I run pg_restore --list on a dump file generated using pg_dump 9.5.5 using pg_restore 9.5.5 or pg_restore 9.6.1 I see the following output:

nnnn; nnnn nnnnnnnnn RULE schema_name rule_name owner_name

If I run pg_restore --list on a dump file generated using pg_dump 9.6.1 using pg_restore 9.6.1 I see the following output:

nnnn; nnnn nnnnnnnnn RULE schema_name view_name rule_name owner_name

That is, view_name has been inserted between schema_name and rule_name and consequently pg_extractor.py is extracting the wrong information when building the filename.

--schemadir creates empty directories

the --schemadir option seems to have a bug. it correctly creates sub directories for each schema under the database directory and fills these with corresponding content but it also creates empty schema directories in the working directory (at the same level the database directory is located)

Extract funcs in single file?

Hi,

I'm trying to use your tool in the following scenario: I'm migrating a database from PG 7.4 to 9.3. This requires many small changes in almost all functions of the database.

I have restored an old version of the database on the new server, and I'm currently working on it. When the job is done, I'd like to be able to replay all the changes I made to the functions on a fresh restore, with all the latest data in it.

I'm currently using the following:

perl pg_extractor.pl -U postgres --dbname=mydb --getfuncs --n=public --orreplace

Works great, but having all functions in a single file would be easier for replaying the changes on the new server. Of course, i could use some script, or concatenate the files, but I was wondering how you would get this done with the current version of pg_extract. I might have missed something!

Thanks for this great tool!

Philippe

duplicate COMMENTs on functions with the same name in different schema

If you have two functions with the same name but in different schema you will end up with the comments for both in both function files. Example:

File for schema1.awesome_function() has:

Function schema1.awesome_function()
Comment on schema1.awesome_function()
Comment on schema2.awesome_function()

File for schema2.awesome_function() has:

Function schema2.awesome_function()
Comment on schema1.awesome_function()
Comment on schema2.awesome_function()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.