Comments (7)
I could see that. What do you think the command syntax should look like for that?
from pg_sample.
I'm currently using:
./pg_sample --limit="*=*, nodes_data=1000000"
I guess something like:
./pg_sample --limit="*=*, nodes_data=1000000;order by timestamp DESC"
or
./pg_sample --limit="*=*, nodes_data=1000000(order by timestamp DESC)"
should be easy to parse and is extensible in case other criteria is added.
from pg_sample.
You should be able to specify a where condition after the =. e.g.,
--limit="users=(user_id < 10)"
from pg_sample.
@mla Had a similar question like this, is it possible to select EVERY table in DESC order? I think all (most?) tables in rails for example have "created_at", so it'd be nice to sample rows with ORDER BY created_at DESC
as the default since usually early rows in a big database have a bunch of inactive rows. I'm trying with --random
but it might be too slow for my purposes
from pg_sample.
Hey @lustickd. Sorry for the delay in responding.
You can try this patch, which should just force that ORDER BY for every table.
diff --git a/pg_sample b/pg_sample
index a73af39..a1b5ec8 100755
--- a/pg_sample
+++ b/pg_sample
@@ -630,6 +630,7 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
notice "No candidate key found for '$table'; ignoring --ordered";
}
}
+ $order = 'created_at DESC';
We'd have to look at how we can express that for general use. Rails doesn't automatically create an index on all created_at columns, does it? That would be my worry, if you have really large tables.
from pg_sample.
You might try this:
--- a/pg_sample
+++ b/pg_sample
@@ -624,7 +624,11 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
} elsif ($opt{ordered}) {
my @cols = find_candidate_key($table);
if (@cols) {
- my $cols = join ', ', map { $dbh->quote_identifier($_) } @cols;
+ my $cols = join ', ',
+ map { "$_ DESC" }
+ map { $dbh->quote_identifier($_) }
+ @cols
+ ;
$order = "ORDER BY $cols";
} else {
notice "No candidate key found for '$table'; ignoring --ordered";
And pass the --ordered option. We order by the first candidate key we find. Rails usually has its "id" column, which should roughly match created_at, I would think. Patch above just adds DESC to those columns. Seems like a reasonable default anyway for that option.
from pg_sample.
Ah that makes sense thanks. Yeah I think created_at
doesn't have an index so I'll go with the id
method 👍
I did mess around a little bit with tsm_system_rows for random sampling and it's significantly faster than using SORT BY random()
in a table with 40 million rows. Runs in 300 milliseconds per table instead of 30 seconds. Apparently the random()
function in postgres loads the entire table into memory which makes it extremely slow.
from pg_sample.
Related Issues (20)
- Feature Request: Wildcards in relations in limit HOT 1
- Docker: Can't exec "pg_dump" HOT 3
- Docker workflow HOT 7
- Make row ordering deterministic HOT 9
- Can't run on a read-only DB. HOT 9
- Is it possibe to use the --limit query based on the contents of one of the sample tables HOT 3
- Could not identify an equality operator for type json HOT 1
- Error using --random parameter HOT 5
- wildcards don't seem to be working in "--limit" HOT 1
- Export data with inserts statements HOT 4
- Version 1.13 missing from github releases HOT 2
- Use of uninitialized value $sample_fk_table in concatenation (.) or string at pg_sample-master/pg_sample line 685 HOT 1
- psql:mini.sql:254159: ERROR: permission denied: "RI_ConstraintTrigger_c_11890996" is a system trigger HOT 3
- Is there a way to export a sample by a table row? HOT 3
- sample_schema table naming greater than 63 characters HOT 10
- ERROR: operator does not exist: json = json at character 35 HOT 1
- psql invalid command \n or \N HOT 5
- docker image can't sample databases newer than postgres 14
- pg_sample doesn't correctly handle generated columns HOT 6
- Query with a list can`t be parsed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_sample.