Git Product home page Git Product logo

Comments (9)

mla avatar mla commented on July 16, 2024

Interesting.

What pg_sample does is copy a subset of data from each table into temp tables under the "sample_schema". If then continues to recursively add data to them to satisfy any FK constraints. Then the result is written to stdout as a dump under the original table names.

How are the permissions set? Is it possible to only allow writing to a specific schema?

from pg_sample.

mehulp118 avatar mehulp118 commented on July 16, 2024

It seems I only have SELECT permissions on this particular DB unfortunately.

Do you have any alternative to this by any chance? The script is more or less fantastic for my use case but just have this one hurdle.

from pg_sample.

mla avatar mla commented on July 16, 2024

I'm not sure.

There are ways to do cross-database queries, apparently. I've never tried any of them:
https://stackoverflow.com/questions/46324/possible-to-perform-cross-database-queries-with-postgresql

So if we could run all the queries on the read-only but write them to a separate writable database then that might work, but I'm sure would require some changes. Would need to experiment with it. I'll try to read up a bit on the options if you think that's a possible route?

from pg_sample.

mehulp118 avatar mehulp118 commented on July 16, 2024

That seems like an awesome solution!

I think it would be really helpful for not only me but anyone else who comes across a similar issue with restrictions.

I'm also going to look into this on my end. 👍

Thank you

from pg_sample.

mla avatar mla commented on July 16, 2024

Ok, I think I have something kinda sorta working.

I created the foreign db connection with postgres_fdw. I connected the "public" remote schema to a local schema. The way the existing code iterates over the tables doesn't bring back the foreign tables, so you need this patch:

diff --git a/pg_sample b/pg_sample
index 07cb37d..02ca042 100755
--- a/pg_sample
+++ b/pg_sample
@@ -506,16 +506,23 @@ notice "[limit] $_->[0] = $_->[1]\n" foreach @limits;
 # more than --limit rows initially.
 my @tables;
 my %sample_tables; # real table name -> sample table name
-my $sth = $dbh->table_info(undef, undef, undef, 'TABLE');
+my $sth = $dbh->prepare(q{
+  SELECT *
+    FROM information_schema.tables
+   WHERE table_type NOT IN ('VIEW')
+});
+$sth->execute;
 while (my $row = lower_keys($sth->fetchrow_hashref)) {
-  next unless uc $row->{table_type} eq 'TABLE'; # skip SYSTEM TABLE values
-  next if $row->{table_schem} eq 'information_schema'; # special pg schema
-  next if $opt{schema} && $row->{table_schem} ne $opt{schema};
+  #use Data::Dumper;
+  #warn Dumper($row), "\n";
+  # next unless uc $row->{table_type} eq 'TABLE'; # skip SYSTEM TABLE values
+  next if $row->{table_schema} eq 'pg_catalog'; # special pg schema
+  next if $opt{schema} && $row->{table_schema} ne $opt{schema};
 
-  my $sname = $row->{pg_schema} || unquote_identifier($row->{TABLE_SCHEM})
+  my $sname = unquote_identifier($row->{table_schema})
     or die "no pg_schema or TABLE_SCHEM value?!";
 
-  my $tname = $row->{pg_table} || unquote_identifier($row->{TABLE_NAME})
+  my $tname = unquote_identifier($row->{table_name})
     or die "no pg_table or TABLE_NAME value?!";

In terms of connecting the foreign db, this is what I did (in my case the dbs are running on the same instance):

try=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

try=# CREATE SERVER ro_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', port '5433', dbname 'real_db');
CREATE SERVER

try=# CREATE USER MAPPING FOR mla SERVER ro_server;
CREATE USER MAPPING

try=# CREATE SCHEMA foreign_schema;
CREATE SCHEMA

# XXX load foreign schema locally

try=# IMPORT FOREIGN SCHEMA public FROM SERVER ro_server INTO foreign_schema;
IMPORT FOREIGN SCHEMA

Note that I imported the schema (but not the data) to the temp database because otherwise I was getting undefined type errors when I tried to do the foreign connection.

With that, I'm now able to run pg_sample against the test database. The output however has the foreign connection schema, so it would require some post-processing to get it to match the original db.

from pg_sample.

mehulp118 avatar mehulp118 commented on July 16, 2024

Ah, I tried this out so far and it works for me so far (like you mentioned I had to do some post processing to add data types)

Past this point I think it's a matter of modifying the code that's there to select and insert rows into the temporary table into selecting the sample rows but then using a separate connection to insert them into this schema that's been copied over.

Not sure what that would take though.

from pg_sample.

mehulp118 avatar mehulp118 commented on July 16, 2024

Hm, just tried this again and I noticed that when I try to run it, it errors out saying something about a view? I'd like to ignore the views

DBD::Pg::db do failed: ERROR: relation "public.test" does not exist CONTEXT: column "table_name" of foreign table "test_view" at pg_sample line 300. main::__ANON__("DBD::Pg::db do failed: ERROR: relation \"public.test\" does "..., DBI::db=HASH(0x556be6d11f40), undef) called at pg_sample line 58

from pg_sample.

mla avatar mla commented on July 16, 2024

@mehulp118 You applied the patch ^^^? Not sure why it would be returning views, since we explicitly ask for WHERE table_type NOT IN ('VIEW')

But yes, it seems like it was close to working, except for the naming issue. I'll try to take another look this week.

from pg_sample.

NicParry avatar NicParry commented on July 16, 2024

Has there been any progress on this?

from pg_sample.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.