Git Product home page Git Product logo

Comments (9)

mla avatar mla commented on July 16, 2024

Seems doable. Just to understand the use case, why are you versioning them?

There's no actual guarantee that the selected rows will be the same anyway, is there? I mean, you can explicitly request "random" results, but otherwise, there's no selection criteria, so we're just relying on whichever rows Pg decides to give us. Usually that's deterministic, but idk if you were to rebuild or change indexes or some such if that would affect which rows are returned. IOW, I'm not sure you can rely on the row selection being deterministic, if that's what you need...

from pg_sample.

olivierdalang avatar olivierdalang commented on July 16, 2024

I'm versioning them because I use pg_sample to produce a seed data dump for development extracted from production data. It's an iterative process as I include more tables every now and then, and want to keep these changes as small as possible to it's easier to see what actually changed.

Of course if the rows are not the same (due to random selection or changes in the data) I don't expect it to be the same :-) But as you said, currently it outputs the row in order postgres returns them, and as you hinted it's indeed not guaranteed to be deterministic (vacuum, indices rebuilds, etc). The only way to get guaranteed deterministic results is to specify the ordering explicitly.

from pg_sample.

mla avatar mla commented on July 16, 2024

Thank you for the commits @olivierdalang. I merged it, but see my changes. There were some issues when a table doesn't have a primary key defined (I expanded it to look for any candidate key, but try to use primary key first if avail).

And there were some issues with quoting if the index columns have reserved words and such (see my quote_identifier calls).

It's passing the tests with --ordered applied now.

I think we should rename the option though. We have a --random option, which randomizes the sample selection. I would have assumed that --ordered is the opposite... but it's not. --ordered orders just the output of the sample set, and the sample itself may still be random or may change if it's rebuilt, etc.

I'm still not clear on the use-case, tbh, but I'm happy to keep the option if it's helpful, but let's think of a different name.... --order-sample-output or --order-sample-result or some such?

from pg_sample.

olivierdalang avatar olivierdalang commented on July 16, 2024

Thanks for the merge !

From your comment, I realize my pull request only does half of the job (as actually --ordered is supposed be the opposite of --random, and should sort on the source table). Just opened a followup PR for that : #33

Sorry, I went a bit too quickly over that 😳 I'd also be happy to include a test case for this new option, but a bit lost about how tests cases are organized (not familiar with perl at all)

from pg_sample.

mla avatar mla commented on July 16, 2024

The tests are under the t/ subdir.

Perl uses TAP (Test Anything Protocol):
https://en.wikipedia.org/wiki/Test_Anything_Protocol

If you have Perl installed locally then you probably have "prove", which runs tests. If you run prove from the project root, it should try to run everything.

e..g, I get

$ prove
t/pg_sample.t .. ok     
All tests successful.
Files=1, Tests=12,  1 wallclock secs ( 0.01 usr  0.00 sys +  0.20 cusr  0.03 csys =  0.24 CPU)
Result: PASS

Run prove --verbose for verbosity.

If you can't find prove you should still be able to run the tests directly: perl t/pg_sample.t

If you take a look a pg_sample.t, you'll see all we're doing is creating a bunch of tables with FKs and such, populating them, then running pg_sample on it. We then load the resulting sample and do some basic checks on it.

Note that some of the naming is intentionally weird, to test using removed words and such (to make sure our quoting/escaping works).

from pg_sample.

mla avatar mla commented on July 16, 2024

On PR #33 can you compare to the changes I made on dev? I added a find_candidate_key() method or some such which I think is more general.

I'm not sure about the '$table'::regclass thing. Will that work correctly if there's multiple tables with the same name in different schemas?

And my code will work on tables that do not have a primary key.

from pg_sample.

olivierdalang avatar olivierdalang commented on July 16, 2024

I merged your changes in #33, so I there's no '$table'::regclass anymore (look at the changes in https://github.com/mla/pg_sample/pull/33/files -> I'll remove unrelated whitespace changes).

Ok for the tests, will give it a shot to try to make a testcase for this feature.

Thanks for willing to accept my patches and the primer about Perl testing :-)

from pg_sample.

olivierdalang avatar olivierdalang commented on July 16, 2024

(continuing the discussion about the PR on the PR itself)

from pg_sample.

mla avatar mla commented on July 16, 2024

Sorry for the delay on this @olivierdalang. Just merged into dev. Looks great to me, thank you. And thanks for adding the tests.

Will make a new release soon.

from pg_sample.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.