Comments (9)
Seems doable. Just to understand the use case, why are you versioning them?
There's no actual guarantee that the selected rows will be the same anyway, is there? I mean, you can explicitly request "random" results, but otherwise, there's no selection criteria, so we're just relying on whichever rows Pg decides to give us. Usually that's deterministic, but idk if you were to rebuild or change indexes or some such if that would affect which rows are returned. IOW, I'm not sure you can rely on the row selection being deterministic, if that's what you need...
from pg_sample.
I'm versioning them because I use pg_sample to produce a seed data dump for development extracted from production data. It's an iterative process as I include more tables every now and then, and want to keep these changes as small as possible to it's easier to see what actually changed.
Of course if the rows are not the same (due to random selection or changes in the data) I don't expect it to be the same :-) But as you said, currently it outputs the row in order postgres returns them, and as you hinted it's indeed not guaranteed to be deterministic (vacuum, indices rebuilds, etc). The only way to get guaranteed deterministic results is to specify the ordering explicitly.
from pg_sample.
Thank you for the commits @olivierdalang. I merged it, but see my changes. There were some issues when a table doesn't have a primary key defined (I expanded it to look for any candidate key, but try to use primary key first if avail).
And there were some issues with quoting if the index columns have reserved words and such (see my quote_identifier calls).
It's passing the tests with --ordered applied now.
I think we should rename the option though. We have a --random option, which randomizes the sample selection. I would have assumed that --ordered is the opposite... but it's not. --ordered orders just the output of the sample set, and the sample itself may still be random or may change if it's rebuilt, etc.
I'm still not clear on the use-case, tbh, but I'm happy to keep the option if it's helpful, but let's think of a different name.... --order-sample-output or --order-sample-result or some such?
from pg_sample.
Thanks for the merge !
From your comment, I realize my pull request only does half of the job (as actually --ordered
is supposed be the opposite of --random
, and should sort on the source table). Just opened a followup PR for that : #33
Sorry, I went a bit too quickly over that 😳 I'd also be happy to include a test case for this new option, but a bit lost about how tests cases are organized (not familiar with perl at all)
from pg_sample.
The tests are under the t/ subdir.
Perl uses TAP (Test Anything Protocol):
https://en.wikipedia.org/wiki/Test_Anything_Protocol
If you have Perl installed locally then you probably have "prove", which runs tests. If you run prove from the project root, it should try to run everything.
e..g, I get
$ prove
t/pg_sample.t .. ok
All tests successful.
Files=1, Tests=12, 1 wallclock secs ( 0.01 usr 0.00 sys + 0.20 cusr 0.03 csys = 0.24 CPU)
Result: PASS
Run prove --verbose for verbosity.
If you can't find prove you should still be able to run the tests directly: perl t/pg_sample.t
If you take a look a pg_sample.t, you'll see all we're doing is creating a bunch of tables with FKs and such, populating them, then running pg_sample on it. We then load the resulting sample and do some basic checks on it.
Note that some of the naming is intentionally weird, to test using removed words and such (to make sure our quoting/escaping works).
from pg_sample.
On PR #33 can you compare to the changes I made on dev? I added a find_candidate_key() method or some such which I think is more general.
I'm not sure about the '$table'::regclass thing. Will that work correctly if there's multiple tables with the same name in different schemas?
And my code will work on tables that do not have a primary key.
from pg_sample.
I merged your changes in #33, so I there's no '$table'::regclass
anymore (look at the changes in https://github.com/mla/pg_sample/pull/33/files -> I'll remove unrelated whitespace changes).
Ok for the tests, will give it a shot to try to make a testcase for this feature.
Thanks for willing to accept my patches and the primer about Perl testing :-)
from pg_sample.
(continuing the discussion about the PR on the PR itself)
from pg_sample.
Sorry for the delay on this @olivierdalang. Just merged into dev. Looks great to me, thank you. And thanks for adding the tests.
Will make a new release soon.
from pg_sample.
Related Issues (20)
- Feature Request: Wildcards in relations in limit HOT 1
- Docker: Can't exec "pg_dump" HOT 3
- Docker workflow HOT 7
- Can't run on a read-only DB. HOT 9
- Is it possibe to use the --limit query based on the contents of one of the sample tables HOT 3
- Could not identify an equality operator for type json HOT 1
- Error using --random parameter HOT 5
- wildcards don't seem to be working in "--limit" HOT 1
- Export data with inserts statements HOT 4
- Version 1.13 missing from github releases HOT 2
- Use of uninitialized value $sample_fk_table in concatenation (.) or string at pg_sample-master/pg_sample line 685 HOT 1
- psql:mini.sql:254159: ERROR: permission denied: "RI_ConstraintTrigger_c_11890996" is a system trigger HOT 3
- Is there a way to export a sample by a table row? HOT 3
- sample_schema table naming greater than 63 characters HOT 10
- ERROR: operator does not exist: json = json at character 35 HOT 1
- psql invalid command \n or \N HOT 5
- docker image can't sample databases newer than postgres 14
- pg_sample doesn't correctly handle generated columns HOT 6
- Query with a list can`t be parsed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_sample.