Git Product home page Git Product logo

Comments (7)

mla avatar mla commented on July 16, 2024

I could see that. What do you think the command syntax should look like for that?

from pg_sample.

chimeno avatar chimeno commented on July 16, 2024

I'm currently using:
./pg_sample --limit="*=*, nodes_data=1000000"

I guess something like:

./pg_sample --limit="*=*, nodes_data=1000000;order by timestamp DESC"

or

./pg_sample --limit="*=*, nodes_data=1000000(order by timestamp DESC)"

should be easy to parse and is extensible in case other criteria is added.

from pg_sample.

mla avatar mla commented on July 16, 2024

You should be able to specify a where condition after the =. e.g.,

--limit="users=(user_id < 10)"

from pg_sample.

lustickd avatar lustickd commented on July 16, 2024

@mla Had a similar question like this, is it possible to select EVERY table in DESC order? I think all (most?) tables in rails for example have "created_at", so it'd be nice to sample rows with ORDER BY created_at DESC as the default since usually early rows in a big database have a bunch of inactive rows. I'm trying with --random but it might be too slow for my purposes

from pg_sample.

mla avatar mla commented on July 16, 2024

Hey @lustickd. Sorry for the delay in responding.

You can try this patch, which should just force that ORDER BY for every table.

diff --git a/pg_sample b/pg_sample
index a73af39..a1b5ec8 100755
--- a/pg_sample
+++ b/pg_sample
@@ -630,6 +630,7 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
       notice "No candidate key found for '$table'; ignoring --ordered";
     }
   }
+  $order = 'created_at DESC';

We'd have to look at how we can express that for general use. Rails doesn't automatically create an index on all created_at columns, does it? That would be my worry, if you have really large tables.

from pg_sample.

mla avatar mla commented on July 16, 2024

You might try this:

--- a/pg_sample
+++ b/pg_sample
@@ -624,7 +624,11 @@ while (my $row = lower_keys($sth->fetchrow_hashref)) {
   } elsif ($opt{ordered}) {
     my @cols = find_candidate_key($table);
     if (@cols) {
-      my $cols = join ', ', map { $dbh->quote_identifier($_) } @cols;
+      my $cols = join ', ',
+        map { "$_ DESC" }
+        map { $dbh->quote_identifier($_) }
+        @cols
+      ;
       $order = "ORDER BY $cols";
     } else {
       notice "No candidate key found for '$table'; ignoring --ordered";

And pass the --ordered option. We order by the first candidate key we find. Rails usually has its "id" column, which should roughly match created_at, I would think. Patch above just adds DESC to those columns. Seems like a reasonable default anyway for that option.

from pg_sample.

lustickd avatar lustickd commented on July 16, 2024

Ah that makes sense thanks. Yeah I think created_at doesn't have an index so I'll go with the id method 👍

I did mess around a little bit with tsm_system_rows for random sampling and it's significantly faster than using SORT BY random() in a table with 40 million rows. Runs in 300 milliseconds per table instead of 30 seconds. Apparently the random() function in postgres loads the entire table into memory which makes it extremely slow.

from pg_sample.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.