mla / pg_sample Goto Github PK
View Code? Open in Web Editor NEWPostgreSQL utility for creating a small, sample database from a larger one
PostgreSQL utility for creating a small, sample database from a larger one
I'm trying to dump a Django project, so I did --limit="django_* = *,table_foo=15,*=15"
, which I was expecting to dump 15 rows from foo
and all rows from all the tables that start with django_
(namely django_migrations
and django_site
). After inspecting the result I found out that only 15 rows were dumped from django_*
tables.
First of all, thank you for your great work!
In case if one is not familiar with Perl and its ecosystem, but wants pg_sample to be just working, I would like to describe my docker setup. Maybe it will save someone couple of hours (or days).
I am not a docker expert, so this configuration may look a bit silly, but it works.
I have database_sample
directory.
FROM postgres:9.6.8
WORKDIR /database_sample/
COPY ./pg_sample ./pg_sample
COPY ./MyConfig.pm /.cpan/CPAN/MyConfig.pm
RUN apt-get --yes update
RUN apt-get --yes install gcc libdbi-perl libpq-dev build-essential autoconf automake libtool gdb
RUN cpan App::cpanminus
RUN cpanm --notest DBD::Pg
CMD ["sh", "-c", "./pg_sample $DATABASE_NAME --file=$DATABASE_NAME.sql --limit=$LIMIT --verbose --username=$PGUSER --password=$PGPASSWORD"]
version: '3.5'
services:
database_sample:
build: .
volumes:
- ./:/database_sample
environment:
- PGHOST=123.45.67.89
- PGPORT=5432
- PGUSER=admin
- PGPASSWORD=
- DATABASE_NAME=
- LIMIT=1000
Auto-generated, makes installation silent
$CPAN::Config = {
'applypatch' => q[],
'auto_commit' => q[0],
'build_cache' => q[100],
'build_dir' => q[/root/.cpan/build],
'build_dir_reuse' => q[0],
'build_requires_install_policy' => q[yes],
'bzip2' => q[/bin/bzip2],
'cache_metadata' => q[1],
'check_sigs' => q[0],
'commandnumber_in_prompt' => q[1],
'connect_to_internet_ok' => q[1],
'cpan_home' => q[/root/.cpan],
'curl' => q[],
'ftp' => q[],
'ftp_passive' => q[1],
'ftp_proxy' => q[],
'getcwd' => q[cwd],
'gpg' => q[/usr/bin/gpg],
'gzip' => q[/bin/gzip],
'halt_on_failure' => q[0],
'histfile' => q[/root/.cpan/histfile],
'histsize' => q[100],
'http_proxy' => q[],
'inactivity_timeout' => q[0],
'index_expire' => q[1],
'inhibit_startup_message' => q[0],
'keep_source_where' => q[/root/.cpan/sources],
'load_module_verbosity' => q[none],
'lynx' => q[],
'make' => q[/usr/bin/make],
'make_arg' => q[],
'make_install_arg' => q[],
'make_install_make_command' => q[/usr/bin/make],
'makepl_arg' => q[INSTALLDIRS=site],
'mbuild_arg' => q[],
'mbuild_install_arg' => q[],
'mbuild_install_build_command' => q[./Build],
'mbuildpl_arg' => q[--installdirs site],
'ncftp' => q[],
'ncftpget' => q[],
'no_proxy' => q[],
'pager' => q[/bin/more],
'patch' => q[/usr/bin/patch],
'perl5lib_verbosity' => q[none],
'prefer_external_tar' => q[1],
'prefer_installer' => q[MB],
'prefs_dir' => q[/root/.cpan/prefs],
'prerequisites_policy' => q[follow],
'recommends_policy' => q[1],
'scan_cache' => q[atstart],
'shell' => undef,
'show_unparsable_versions' => q[0],
'show_upload_date' => q[0],
'show_zero_versions' => q[0],
'suggests_policy' => q[0],
'tar' => q[/bin/tar],
'tar_verbosity' => q[none],
'term_is_latin' => q[1],
'term_ornaments' => q[1],
'test_report' => q[0],
'trust_test_report_history' => q[0],
'unzip' => q[],
'urllist' => [q[http://www.cpan.org/]],
'use_prompt_default' => q[0],
'use_sqlite' => q[0],
'version_timeout' => q[15],
'wget' => q[],
'yaml_load_code' => q[0],
'yaml_module' => q[YAML],
};
1;
__END__
And then I run it like
docker-compose run \
-e DATABASE_NAME=<dbname> \
-e PGPASSWORD=<password> \
database_sample
-- Unicode support
use utf8;
binmode(STDOUT,':utf8');
--
Hi @mla This tool is great for some of our smaller databases. Thanks!
However, with a larger data warehouse (~2000 tables) we have 31 of those tables where the combination of the schema_name and table_name bombs on the population of the sample_schema due to 63 character limit imposed by the postgresql Max Identifier Length (NAMEDATALEN) and I suspect will also have issues on retrieval of the sample data for COPY
FROM stdin portion of sampling.Error that I'm getting on the sample schema population (using "--sample_schema pgs" option):
Creating table "pgs"."marketing_analytics_transunion_factortrust_combined_ccb_scoring_legacy" 1000
Creating table "pgs"."marketing_analytics_transunion_factortrust_combined_ccb_scoring_python38" DBD::Pg::db do failed: ERROR: relation "marketing_analytics_transunion_factortrust_combined_ccb_scoring" already exists at ./pg_sample line 300.
main::__ANON__("DBD::Pg::db do failed: ERROR: relation \"marketing_analytics_"..., DBI::db=HASH(0x7fc45e828930), undef) called at ./pg_sample line 634
Using the --keep option, I was able to see that the previous table name (_legacy suffix) gets truncated to "marketing_analytics_transunion_factortrust_combined_ccb_scoring" at 63 characters and then the subsequent table name is same name except for the _python38 suffix and thus appears that the table already exists.
Do you have ideas and/or a potential patch that could address this? I've tried using --limit="marketing_analytics.transunion_\d+=0"
but that still tries to create the schema. I could skip this marketing_analytics schema by running through the others individually with --schema option but that would only help for 6 of the 31 affected. The other 25 are dispersed across 9 other schemas.
Thanks for any help you can provide.
Hi !
Thanks for the library, works really well (from the dev branch).
In the dockerfile, instead of
ENTRYPOINT ["tail"]
CMD ["-f","/dev/null"]
you could have
ENTRYPOINT ["pg_sample"]
This way it's easier to run, in just one command, without having to create a named container.
docker build -t pg_sample .
docker run -it --rm --network=host -v "$(PWD):/io" --entrypoint="" pg_sample mydb --host=127.0.0.1 --username=postgres --password=postgres --file /io/output.sql --limit="* = 5" --data-only
Additionally, configuring auto-build on dockerhub is super-easy, and that would even avoid having to checkout/build the dockerfile locally.
If you're willing to accept this changes, I can prepare a PR.
Would it be possible to select rows based in very simple criteria?
I mean, I have a database with a very large table and I would like to
select all rows for all tables except data
table where I want 1000000 rows ORDER by timestamp DESC;
Thanks for the library.
Hello!
Thank you for this wonderful project, I've been playing around with pg_sample
and it is simply great, this is not really an issue but more of a question of the order of operations.
First, if I understand correctly you cannot use pg_sample
to get a database from a remote host if you don't have write
permission on that database is that correct?
If the first assumption is correct then the order of operations is: pg_dump
, pg_restore
and then pg_sample
?
Lastly, why does the script need write permission? I'm sorry if I'm missing something obvious but I'm fairly new to databases in general.
All the best!
Hello,
pg_sample v1.10 although pg_sample --version displays v1.13
Using the following command : pg_sample-master/pg_sample --limit=10000 --file=sample.dump --trace test
I get :
<- quote_identifier('_pg_sample_idx7')= ( '"_pg_sample_idx7"' ) [1 items] at pg_sample line 683
Use of uninitialized value $sample_fk_table in concatenation (.) or string at pg_sample-master/pg_sample line 685.
!! ERROR: 7 'ERROR: syntax error at or near "("
LIGNE 1 : CREATE INDEX "_pg_sample_idx7" ON (col1, col2)
^' (err#0)
<- do(' CREATE INDEX "_pg_sample_idx7" ON (col1, col2) ')= ( undef ) [1 items] at pg_sample line 685
-> HandleError on DBI::db=HASH(0x15c1a10) via CODE(0x157c120) (undef)
DBD::Pg::db do failed: ERROR: syntax error at or near "("
LIGNE 1 : CREATE INDEX "_pg_sample_idx7" ON (col1, col2)
^ at pg_sample-master/pg_sample line 300.
main::ANON('DBD::Pg::db do failed: ERROR: syntax error at or near "("\x{a}LI...', 'DBI::db=HASH(0x15c1ae8)', undef) called at pg_sample-master/pg_sample line 685
ERROR: 7 'ERROR: syntax error at or near "("
LIGNE 1 : CREATE INDEX "_pg_sample_idx7" ON (col1, col2)
^' (err#0)
The "CREATE INDEX" is for a partitioned table and it appears to break when it reaches the 7th partition.
Any idea ?
Regards
Given the following script:
mydb | psql -v ON_ERROR_STOP=1 mydb-sample
I received an error (see below). I redacted a bunch of private information about my company's schema. I hope that I'm providing enough information for you to get an idea of what's causing this. Adding a --trace
option gave me the following:
!! ERROR: 7 "ERROR: could not identify an equality operator for type json
LINE 3: ...[REDACTED COLUMN NAMES], t1."argume...
^" (err#0)
<- do("
INSERT INTO "_pg_sample"."004d__public_[redacted_table_A]"
SELECT DISTINCT t1."id", [REDACTING COLUMNS TO PROTECT PRIVATE INFORMATION],
t1."arguments", [REDACTING COLUMNS TO PROTECT PRIVATE INFORMATION]
FROM [redacted_table_A] t1
JOIN "_pg_sample"."004a__public_[redacted_table_B]" f1 ON (f1.[redacted_fk_column_id] = t1.id)
LEFT JOIN "_pg_sample"."004d__public_[redacted_table_A]" s1 ON (t1.id = s1.id)
WHERE s1.id IS NULL
")= ( undef ) [1 items] at pg_sample line 836
-> HandleError on DBI::db=HASH(0x5f0c02df48d0) via CODE(0x5f0c02d38f60) (undef)
<- can(CARP_TRACE) = 0 (? 0)
DBD::Pg::db do failed: ERROR: could not identify an equality operator for type json
LINE 3: ...[REDACTED COLUMN NAMES], t1."argume...
^ at ./pg_sample line 336.
main::__ANON__("DBD::Pg::db do failed: ERROR: could not identify an equality"..., DBI::db=HASH(0x5f0c02df4ae0), undef) called at ./pg_sample line 836
ERROR: 7 "ERROR: could not identify an equality operator for type json
LINE 3: ...[REDACTED COLUMN NAMES], t1."argume...
^" (err#0)
<- DESTROY(DBI::st=HASH(0x5f0c02bca838))= ( undef ) [1 items] at Carp.pm line 290
!! The ERROR '7' was CLEARED by call to do method
<- do('DROP SCHEMA _pg_sample CASCADE')= ( '0E0' ) [1 items] at pg_sample line 924
! <- DESTROY(DBI::db=HASH(0x5f0c02df4ae0))= ( undef ) [1 items] during global destruction
the t1.arguments
column is a json
data type and having PGAdmin generate a CREATE TABLE
script for the table that contains the arguments
column gives the following information on that particular column:
arguments json NOT NULL DEFAULT '{}'::json,
I think this should be enough information to recreate the problem. It seems pretty straightforward: JSON columns can't be compared for equality it seems? Which would make sense unless you could get some kind of hash value of the object to compare against.
Would omitting the DISTINCT
statement for tables with these column types be a good work-around in this situation?
Probably a table with a json
column should probably be enough to recreate the error, right? If you need, I can give more information and maybe some sample data in the arguments
column itself but I want to try to avoid that unless I can get permission first.
I'm using PostgreSQL 9.2.15-1PGDG.rhel6 on a Centos 6.7 64 Bits.
When executing the command:
./pg_sample dbname --data-only --file=test.sql --limit="ja_admins = 1000"
I got the following error:
Creating table "_pg_sample"."pg_catalog_pg_statistic" DBD::Pg::db do failed: ERROR: column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
main::__ANON__('DBD::Pg::db do failed: ERROR: column "stavalues1" has pseudo...', 'DBI::db=HASH(0xfdec08)', undef) called at ./pg_sample line 538
Dropping sample schema _pg_sample
Done.
Using --trace option:
)= undef at pg_sample line 538
-> HandleError on DBI::db=HASH(0x289b660) via CODE(0x2858218) (undef)
DBD::Pg::db do failed: ERROR: column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
main::__ANON__('DBD::Pg::db do failed: ERROR: column "stavalues1" has pseudo...', 'DBI::db=HASH(0x289b738)', undef) called at ./pg_sample line 538
ERROR: 7 'ERROR: column "stavalues1" has pseudo-type "anyarray"' (err#0)
<- DESTROY(DBI::st=HASH(0x289bb70))= undef at Carp.pm line 45
Dropping sample schema _pg_sample
!! The ERROR '7' was CLEARED by call to do method
<- do('DROP SCHEMA _pg_sample CASCADE')= '0E0' at pg_sample line 713
Done.
! <- DESTROY(DBI::db=HASH(0x289b738))= undef at pg_sample line 45 during global destruction
What can I do to solve this issue?
hey,
as the title asks, is there a way to export a sample of the db based on the relationships and the data from a single table?
example:
start from table A where the query given is A.id=1, go through all the tables foreign keys, and export the relavent data?
thanks in advance!
Great tool! ✋
Im facing the following issue when trying to load, Im on PG 14, Im running this command (mini.sql generated by pg_sample).
and I get a lot of errors: psql:mini.sql:254159: ERROR: permission denied: "RI_ConstraintTrigger_c_11890996" is a system trigger
and before that a lot of integrity errors. Any suggestions?
Hi @mla !
Thanks for the awesome job you have done by delivering that nice tool ;-)
I found it very useful. I have an idea for improvement (just have to grasp a bit of Perl):
--schema
param does behave the same as in pg_dump (best if not very complicated alow multiple repetitions of --schema
or short -n
to follow the pg_dump naming to specify multiple schemas to be considered by the script)--sample_schema
takes responsibility for current --schema
functionality of pg_sample.tell me what do you think about, what limitations (both Perl and Postgres), edge cases should I consider when tackling that issue?
Another thing for improvement sharing probably a common codebase will be support for --table
level granularity.
-- A basic run of v1.07 like
-- ./pg_sample --file=brick_testdb_sample.out --limit=1000 --username=postgres testdb
-- yields the following error in the db:
LOG: statement: CREATE SCHEMA _pg_sample
ERROR: column t.spclocation does not exist at character 488
-- the failing query
SELECT NULL::text AS "TABLE_CAT"
, quote_ident(n.nspname) AS "TABLE_SCHEM"
, quote_ident(c.relname) AS "TABLE_NAME"
, CASE
WHEN c.relkind = 'v' THEN
CASE WHEN quote_ident(n.nspname) ~ '^pg_' THEN 'SYSTEM VIEW' ELSE 'VIEW' END
ELSE
CASE WHEN quote_ident(n.nspname) ~ '^pg_' THEN 'SYSTEM TABLE' ELSE 'TABLE' END
END AS "TABLE_TYPE"
, d.description AS "REMARKS" , quote_ident(t.spcname) AS "pg_tablespace_name", quote_ident(t.spclocation) AS "pg_tablespace_location"
FROM pg_catalog.pg_class AS c
LEFT JOIN pg_catalog.pg_description AS d
ON (c.relfilenode = d.objoid AND d.objsubid = 0)
LEFT JOIN pg_catalog.pg_namespace n ON (n.oid = c.relnamespace) LEFT JOIN pg_catalog.pg_tablespace t ON (t.oid = c.reltablespace)
WHERE c.relkind = 'r'
ORDER BY "TABLE_TYPE", "TABLE_CAT", "TABLE_SCHEM", "TABLE_NAME"
-- The change is to the "pg_tablesapce_location(t.oid)" line in the SELECT list
-- PG view "pg_tablespace" was changed in:
-- http://www.postgresql.org/docs/current/static/release-9-2.html#AEN110896
-- Note section E.5.2.1
-- Suggested fix for pg versions >= 9.1:
SELECT NULL::text AS "TABLE_CAT"
, quote_ident(n.nspname) AS "TABLE_SCHEM"
, quote_ident(c.relname) AS "TABLE_NAME"
, CASE
WHEN c.relkind = 'v' THEN
CASE WHEN quote_ident(n.nspname) ~ '^pg_' THEN 'SYSTEM VIEW' ELSE 'VIEW' END
ELSE
CASE WHEN quote_ident(n.nspname) ~ '^pg_' THEN 'SYSTEM TABLE' ELSE 'TABLE' END
END AS "TABLE_TYPE"
, d.description AS "REMARKS"
, quote_ident(t.spcname) AS "pg_tablespace_name"
, pg_tablespace_location(t.oid) AS "pg_tablespace_location"
FROM pg_catalog.pg_class AS c
LEFT JOIN pg_catalog.pg_description AS d ON (c.relfilenode = d.objoid AND d.objsubid = 0)
LEFT JOIN pg_catalog.pg_namespace n ON (n.oid = c.relnamespace)
LEFT JOIN pg_catalog.pg_tablespace t ON (t.oid = c.reltablespace)
WHERE c.relkind = 'r'
ORDER BY "TABLE_TYPE", "TABLE_CAT", "TABLE_SCHEM", "TABLE_NAME"
Hello!
I put my limit as --limit="auth_user=id IN (1,2,3,4)". pg_sample command with a verbose option otputs an error message - "unexpected limit value '4)''".
I think this problem can be fixed by adding comma escape in regex here https://github.com/mla/pg_sample/blob/master/pg_sample#L626C47-L626C54.
If you create a sample database with a single table as follows:
CREATE TABLE some_numbers(
id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY
, important_value int
, double_value int GENERATED ALWAYS AS (important_value*2) STORED
);
INSERT INTO some_numbers(important_value) (SELECT * FROM generate_series(1,100));
Then dump the table using:
pg_sample --limit="some_numbers=10%" --file=pg_sample_generated_bug.sql
You get a dump file with the generated columns in, which you can't import because the schema prevents it:
\i dump_files/pg_sample_generated_bug.sql
SET
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
SET
SET
SET
SET
SET
ALTER TABLE
psql:dump_files/pg_sample_generated_bug.sql:85: ERROR: extra data after last expected column
CONTEXT: COPY some_numbers, line 1: "14 14 28"
ALTER TABLE
Note that this bug doesn't occur with raw pg_dump, which correctly omits the generated column.
Hello,
Thank you for the handful tool
Is there any opportunity to use wildcards like * and ? in table names?
I can use * after schema name, but cannot specify the table prefix
pg_sample
is returning the following error when I run it from within the Docker container:
root@docker-desktop:/pg_sample# ./pg_sample dev-db --host=localhost --port=63123 -U postgres -W pass --limit=20 --verbose --keep
Server encoding is UTF8
Client encoding is UTF8
Exporting schema
Can't exec "pg_dump": No such file or directory at ./pg_sample line 471.
command 'pg_dump --schema-only' failed: -1 at ./pg_sample line 471.
Done.
Say I had tables school, class.
If I dumped 100 classes from the class table and it depends on schools. Can I then put a limit such as school = pg_sample.classes = school_ids (meaning I want to get only the schools that exist in the classes that were pulled by pg_sample already)?
Hello again :)
i wondered if it would make sense to extend the functionality of the limit parameter in a way, that wildcards are permitted in relation names.
Example: limit="meta.=0" dumps no data from tables in the meta schema. But what if i want to just exclude specific relations (for example partitions) that all have common name parts except a suffix (for example meta.test_1, meta.test_2, ...). limit="meta.test_=0" does not work currently and i would have to specify every partitions separately.
Just a proposal, don't know if this would make sense. Furthermore, if it would be too hard to implement it could not be worth the effort.
When I run the script in the contrib directory against my database, I am able to get my sample of data with JSON columns. However, after I get my data sample, I see the following in my logs:
CONTEXT: SQL function "jsoncmp" during startup
automatic analyze of table "database.public.table"
ERROR: operator does not exist: json = json at character 35
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
QUERY:
select case
when $1 = $2 then 0
when $1 < $2 then -1
else 1
end
I am running Postgres 12.10 in an Ubuntu docker container on a Macbook Pro M1.
How can I fix this?
DBD::Pg::db table_info failed: ERROR: column t.spclocation does not exist
LINE 11: ...t(t.spcname) AS "pg_tablespace_name", quote_ident(t.spclocat...
^ at /usr/bin/pg_sample line 296
main::ANON('DBD::Pg::db table_info failed: ERROR: column t.spclocation d...', 'DBI::db=HASH(0x1bc4eb8)', 'DBI::st=HASH(0x1bc5398)') called at /usr/bin/pg_sample line 499
Maybe this is a known issue, but I don't know what can I do with this error, I know that the column spclocation doesn't exist at pg_tablespace since postgresql 9.2 and I using postgresql 9.5. So, ¿what can I do to resolve that?
Thanks.
Packages installed:
perl-DBD-Pg-2.17.1
perl-DBI-1.622-1
postgresql 9.5
I've downloaded pg_sample from yum. Pg_sample version 1.09
Is it possible to export data with insert
statements over COPY
?
Hello,
I love your tool and already used it quite frequently. However, in a current use case, a "bug" came to our mind.
When sampling a database, the data of partitions seem to be dumped in a wrong way. This results in the same data being present in the parent table and in the respective partition, which, in our scenario with constraints, results in duplicate key errors.
One possibility would be to detect if the table to dump is a parent of a partition and, if yes, skip the data-dump. Hence, all data would be in the children partitions and no constraint problems shall occur.
Keep up with your cool program and thanks for your efforts.
Hi,
The output generated from our database contains lot of \N
or \n
characters.
Those characters are interpreted by psql when i try to import the dump in a new database.
The encoding between the source and the target is utf-8.
Is there a way to avoid this error ?
Thank you
Hi!
I tried running pg_sample on my database instance, but because one of the tables has a column with data type json and PG throws the following error on line 673: Could not identify an equality operator for type json
I think the query pg_sample is trying to execute is:
INSERT INTO $target_sample_table
SELECT DISTINCT t1.*
FROM $target_table t1
JOIN $sample_fk_table f1 ON ($join1)
LEFT JOIN $target_sample_table s1 ON ($join2)
WHERE $where
I looked around in stackoverflow and apparently it has to do with DISTINCT being applied in JSON columns
Is there a way to add a fallback to a DISTINCTless query when there's a json column?
Hey
When I have a table with less than 100 records and use the --random
parameter, I get this error:
[limit] .* = 100
Creating table "_pg_sample"."test" DBD::Pg::db do failed: ERROR: sample percentage must be between 0 and 100 at pg_sample line 300.
main::__ANON__("DBD::Pg::db do failed: ERROR: sample percentage must be betw"..., DBI::db=HASH(0xaaab11eba948), undef) called at pg_sample line 572
Dropping sample schema _pg_sample
Any thoughts? thank you
Hello,
Last version of pg_sample.
I've got this error:
DBD::Pg::db do failed: ERROR: could not identify an equality operator for type lquery
LINE 3: SELECT DISTINCT t1.*
^ at ./pg_sample line 296.
main::ANON("DBD::Pg::db do failed: ERROR: could not identify an equality"..., DBI::db=HASH(0x5604c5186bc0), undef) called at ./pg_sample line 644
command:
./pg_sample -U user database --limit="test.user = 1000, test.user=active,test.user_has_action = 100000,test.user_has_review = 10000 ,test.entity_has_comment=10000, test.user_has_rating=100000, date.* = *, comp.* = *" | psql -U user dbsample
thanks for your help.
To reproduce:
Not sure if it's better to always drop the _pg_sample schema on start if it exists, or trapping the exit signal and removing it then.
How do I install this?
DBD::Pg::db do failed: ERROR: could not identify an equality operator for type json
LINE 3: SELECT DISTINCT t1.*
^ at ./pg_sample line 296.
main::__ANON__("DBD::Pg::db do failed: ERROR: could not identify an equality"..., DBI::db=HASH(0x55d7b81db5b0), undef) called at ./pg_sample line 644
Branch release/1.13
seems to have been merged to master in this commit, but 1.12 is shown as the latest release on Github.
Should a new release be made?
Hi,
I'm attempting to run this on a read-only replica of a database and I noticed that this is failing because of the following line.
Line 488 in 27c2916
I noticed here that the pg_sample triggers CREATE SCHEMA. Would you be able to give an explanation as to why I would need write / creating access to complete this task.
Are there any alternatives that would allow me to run this in a read-only db?
Hi !
It seems the rows in the resulting SQL statement are not deterministic (two identical calls on the same database can result in different SQLs, containing the same rows but in different orders). This is a bit of a deal breaker when dumps must be versionned.
Postgres doesn't guarantee rows orders unless there's a ORDER BY clause.
So maybe instead of COPY $sample_table TO STDOUT
we should have COPY (SELECT * FROM $sample_table ORDER BY id) TO STDOUT
(with some logic to retrieve the primary key).
Thanks !!
pg_dump can't be used on databases newer than its version, and the default version of the psql client libraries for buster is 14.
The best solution is probably to update the Perl base image to a newer version of Debian. Buster is in LTS and goes completely out of support in June 2024
I can whip up a quick PR if that's helpful?
Hi there,
I tried to use pg_sample with --no-privileges option but it does not work.
Is this option available with pg_sample?
Here are some errors:
psql:/var/lib/postgresql/all_data.sql:29313: ERROR: role "tester" does not exist
psql:/var/lib/postgresql/all_data.sql:51634: ERROR: syntax error at or near "s"
LINE 1: s:7:"deleted";
psql:/var/lib/postgresql/all_data.sql:56352: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56353: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56354: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56355: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56356: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56357: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56358: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56359: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56360: invalid command \N
psql:/var/lib/postgresql/all_data.sql:56361: invalid command \N
Thanks
Lucas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.