Git Product home page Git Product logo

Comments (4)

judell avatar judell commented on August 19, 2024 1

Hey @LalitTurbot, thanks for following up on that!

from steampipe-plugin-github.

e-gineer avatar e-gineer commented on August 19, 2024

Definitely makes sense to add the files column. JSON is a reasonable starting point for it too ... if a piece of data from deep in the JSON is used a lot then sometimes we elevate it up to a column on it's own, but that is usually when completely obvious or widely requested (easy to add columns, very hard to deprecate/remove them).

A slight change to the Postgres makes it easier to use I believe. For example:

select
  f ->> 'filename' as name,
  f ->> 'language' as language
from
  github_my_gist as g,
  jsonb_array_elements(g.files) as f

Add a good example like that to the docs with it and it would be good to go / easy to use.

from steampipe-plugin-github.

judell avatar judell commented on August 19, 2024

Hmm. I think it needs to be jsonb_object_keys because the JSON is like so:

 {
  "files": {
   "gistfile1.txt": {
    "filename": "gistfile1.txt",
    "language": "Text",
    "raw_url": "https://gist.githubusercontent.com/judell/9744381/raw/cd2f695f7e776e82ef0c6dc6678a6322a514f5f9/gistfile1.txt",
    "size": 6341,
    "type": "text/plain"
   }
  }
 }

So maybe like this, to clean up the referencing?

> with filenames as (
 select
   id,
   jsonb_object_keys(files) as name
 from github_my_gist
)
select
 f.name,
 g.files -> f.name ->> 'language' as language,
 g.description
from filenames f
join github_my_gist g using (id) limit 5
+---------------------------------+----------+----------------------------------------------------+
| name                            | language | description                                        |
+---------------------------------+----------+----------------------------------------------------+
| internal_and_public_id          | <null>   | convert between hypothesis internal and public ids |
| async-postgres-listener.py      | Python   |                                                    |
| find-unused-security-groups.sql | SQL      | PATTERN: UNION similar things to combine them      |
| plpython.md                     | Markdown |                                                    |
| inventory                       | <null>   | h data inventory                                   |
+---------------------------------+----------+----------------------------------------------------+

from steampipe-plugin-github.

judell avatar judell commented on August 19, 2024

As per https://steampipe.io/blog/adding-a-column-to-a-table, we decided to flatten the object to a simple array of objects. The query solution we gave was:

select 
  f ->> 'language' as language,
  count(*)
from 
  github_my_gist g
cross join
  jsonb_array_elements(g.files) f
group by
  language
order by 
  count desc

On reflection I'm still puzzled by the cross join. The literature suggests that the "Cartesian product" it produces is useful for things like this.

create table colors (color text);
insert into colors(color) values ('red');
insert into colors(color) values ('green');

create table sizes (size text);
insert into sizes(size) values ('small');
insert into sizes(size) values ('medium');
insert into sizes(size) values ('large');

select * from colors cross join sizes;

  size  | color 
--------+-------
 small  | red
 small  | green
 medium | red
 medium | green
 large  | red
 large  | green

Here is a simplified example of the case discussed in the blog post.

create table my_gist(id text, description text, files jsonb);
insert into my_gist(id, description, files) 
values ('b89721e4f71c3f647e8d686887de3008', 'gist-with-two-files', '[{"filename": "file1.md"}, {"filename": "file2.md"}]')

select * from my_gist

                id                |     description     |                        files                         
----------------------------------+---------------------+------------------------------------------------------
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md", }, {"filename": "file2.md"}]


select jsonb_array_elements(g.files) as files from my_gist

   files
-----------------------
{"filename": "file1.md"}
{"filename": "file2.md"}

Cross joining against the files object.

select 
  g.id,
  g.description,
  files
from 
  my_gist g
cross join
  jsonb_array_elements(g.files) files;


                id                |     description     |                                    files                                     
----------------------------------+---------------------+------------------------------------------------------------------------------
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]

That seems like a weird Cartesian product! Cross joining with indexing into the files object gets us where we want to go.

select 
  id,
  description,
  f ->> 'filename'
from 
  my_gist g
cross join
  jsonb_array_elements(g.files) f;


                id                |     description     | filename 
----------------------------------+---------------------+----------
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file1.md
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file2.md

But how to explain it? I'm just not seeing how it works.

I've always gone with this approach.

select 
  id,
  description,
  jsonb_array_elements(files) as files
from 
  my_gist;

                id                |     description     |         jsonb_array_elements         
----------------------------------+---------------------+--------------------------------------
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | { "filename": "file1.md"}
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | { "filename": "file2.md"}

I'm used to how unnesting an array produces array-length rows, and I think of jsonb_array_elements as the jsonb version of that.

And

```sql
select 
  id,
  description,
  jsonb_array_elements(files) ->> 'filename' as filename
from 
  my_gist;
  
                 id                |     description     | filename 
-----------------------------------+---------------------+----------
 b89721e4f71c3f647e8d686887de3008  | gist-with-two-files | file1.md
 b89721e4f71c3f647e8d686887de3008  | gist-with-two-files | file2.md

The unnest <-> jsonb_array_elements analogy enables me to grok this. But the cross join escapes me. How do you get a Cartesian product with two identical rows like this?

 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]

Indexing into it with f ->> 'filename' gets the "right" answer but I just don't grok the mechanism.

                id                |     description     | filename 
----------------------------------+---------------------+----------
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file1.md
 b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file2.md

from steampipe-plugin-github.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.