Git Product home page Git Product logo

Comments (15)

simonw avatar simonw commented on September 28, 2024

The current documented display options are:

Add --nl to collapse these to single lines as valid newline-delimited JSON.

Add --array to output a valid JSON array of objects instead.

Defaulting to pretty-printed invalid newline-delimited JSON was a weird design choice that I made!

% s3-credentials list-buckets
{
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
}
{
    "Name": "simonw-test-bucket-for-s3-credentials",
    "CreationDate": "2021-11-03 21:46:12+00:00"
}

Fixing this would require a major version bump if I had hit 1.0 already.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

In that case I think the safest default would be to turn the above default into a pretty-printed streaming JSON array:

% s3-credentials list-buckets
[
  {
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
  },
  {
    "Name": "simonw-test-bucket-for-s3-credentials",
    "CreationDate": "2021-11-03 21:46:12+00:00"
  }
]

Then drop the --array option but keep --nl, which would output this (already implemented):

% s3-credentials list-buckets
{"Name": "aws-cloudtrail-logs-462092780466-f2c900d3", "CreationDate": "2021-03-25 22:19:54+00:00"}
{"Name": "simonw-test-bucket-for-s3-credentials", "CreationDate": "2021-11-03 21:46:12+00:00"}

And add a new --csv option.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Getting this right will mean I can pipe into sqlite-utils insert easily to create a SQLite database, which would be fun.

Actually this works already:

s3-credentials list-buckets --nl | sqlite-utils insert /tmp/s3.db buckets - --nl

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Here's how the current list-buckets implementation works:

if array:
gathered.append(bucket)
else:
if nl:
click.echo(json.dumps(bucket, default=str))
else:
click.echo(json.dumps(bucket, indent=4, default=str))
if gathered:
click.echo(json.dumps(gathered, indent=4, default=str))

I need a new abstraction I can call that knows how to turn an iterator of rows into one of the desired formats.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Most interesting new implementation here will be the code that knows how to output something like this in a streaming fashion, without buffering it all in an array first:

[
  {
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
  },
  {
    "Name": "aws-sam-cli-managed-default-samclisourcebucket-1ksajo4h62s07",
    "CreationDate": "2020-06-16 23:13:34+00:00"
  },
  {
    "Name": "blah-bucket-blah",
    "CreationDate": "2021-11-10 23:50:08+00:00"
  }
]

Trick will be to output [ at the start, then two-space indented (with textwrap) json.dumps(..., indent=2) rows with commas after each except the last one - and then a ] at the end.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Good code to imitate from sqlite-utils: https://github.com/simonw/sqlite-utils/blob/74586d3cb26fa3cc3412721985ecdc1864c2a31d/sqlite_utils/cli.py#L1589-L1623 - in particular this CSV/TSV bit:

writer = csv_std.writer(sys.stdout, dialect="excel-tab" if tsv else "excel")
writer.writerow(headers)
for row in cursor:
    writer.writerow(row)

Also https://github.com/simonw/sqlite-utils/blob/74586d3cb26fa3cc3412721985ecdc1864c2a31d/sqlite_utils/cli.py#L2489-L2514

def output_rows(iterator, headers, nl, arrays, json_cols):
    # We have to iterate two-at-a-time so we can know if we
    # should output a trailing comma or if we have reached
    # the last row.
    current_iter, next_iter = itertools.tee(iterator, 2)
    next(next_iter, None)
    first = True
    for row, next_row in itertools.zip_longest(current_iter, next_iter):
        is_last = next_row is None
        data = row
        if json_cols:
            # Any value that is a valid JSON string should be treated as JSON
            data = [maybe_json(value) for value in data]
        if not arrays:
            data = dict(zip(headers, data))
        line = "{firstchar}{serialized}{maybecomma}{lastchar}".format(
            firstchar=("[" if first else " ") if not nl else "",
            serialized=json.dumps(data, default=json_binary),
            maybecomma="," if (not nl and not is_last) else "",
            lastchar="]" if (is_last and not nl) else "",
        )
        yield line
        first = False
    if first:
        # We didn't output any rows, so yield the empty list
        yield "[]"

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Actually I probably want to use csv.DictWriter here:

writer = csv.DictWriter(sys.stdout, headers)
dict_writer.writeheader()
dict_writer.writerows(iterator_of_dictionaries)

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

This outputs 2-indented JSON in a streaming fashion:

def output_rows_json(iterator):
    # We have to iterate two-at-a-time so we can know if we
    # should output a trailing comma or if we have reached
    # the last row.
    current_iter, next_iter = itertools.tee(iterator, 2)
    next(next_iter, None)
    first = True
    for row, next_row in itertools.zip_longest(current_iter, next_iter):
        is_last = next_row is None
        data = row
        line = "{firstchar}{serialized}{maybecomma}{lastchar}".format(
            firstchar="[\n" if first else "",
            serialized=textwrap.indent(json.dumps(data, indent=2, default=repr), '  '),
            maybecomma="," if not is_last else "",
            lastchar="\n]" if is_last else "",
        )
        yield line
        first = False
    if first:
        # We didn't output any rows, so yield the empty list
        yield "[]"

Demo:

print("\n".join(output_rows_json([{"id": 1, "name": "Simon"}, {"id": 2, "name": "Cleo"}, {"id": 3, "name": "Azi"}])))

[
  {
    "id": 1,
    "name": "Simon"
  },
  {
    "id": 2,
    "name": "Cleo"
  },
  {
    "id": 3,
    "name": "Azi"
  }
]

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Turned that into a TIL: https://til.simonwillison.net/python/output-json-array-streaming

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Ran into a problem applying this to list-users:

% s3-credentials list-users --csv
Path,UserName,UserId,Arn,CreateDate
... many rows follow ...
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/bin/s3-credentials", line 33, in <module>
    sys.exit(load_entry_point('s3-credentials', 'console_scripts', 's3-credentials')())
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/Dropbox/Development/s3-credentials/s3_credentials/cli.py", line 495, in list_users
    output(iterate(), nl, csv, tsv)
  File "/Users/simon/Dropbox/Development/s3-credentials/s3_credentials/cli.py", line 789, in output
    writer.writerows(itertools.chain([first], iterator))
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'PasswordLastUsed'

CSV output failed because one of the later rows had a new unexpected column.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Options for fixing this:

  • Silently ignore columns that were not in the first record. Easiest fix.
  • Watch out for these warnings and show them at the end, after ignoring them while outputting stuff. Bit ugly.
  • For CSV mode load everything into memory first to check for the maximum set of headers. This breaks the goal of having this work efficiently with the streamed data.
  • Figure out the full set of possible columns and hard-code that into the application. Probably the best solution?

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

I considered an option where it spots the error, runs to the end to capture all possible headers, then runs the entire command again - but that wouldn't work because we would already have outputted headers and previous rows to stdout.

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

I'm going to hard-code in the list of known columns. This also gives me control over the order in which they are output.

For list-users that's https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam.html#IAM.Client.list_users

  • UserName
  • UserId
  • Arn
  • Path
  • CreateDate
  • PasswordLastUsed
  • PermissionsBoundary
  • Tags

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

Fun trick with this:

% s3-credentials list-users --tsv | sqlite-utils memory stdin:tsv 'select * from stdin' -t
UserName                                               UserId                 Arn                                                                                   Path    CreateDate                 PasswordLastUsed           PermissionsBoundary    Tags
-----------------------------------------------------  ---------------------  ------------------------------------------------------------------------------------  ------  -------------------------  -------------------------  ---------------------  ------
custom-policy                                          AIDAWXFXAIOZNQQMEOHUA  arn:aws:iam::462092780466:user/custom-policy                                          /       2021-11-03 18:31:22+00:00
dogsheep-photos-simon-read                             AIDAWXFXAIOZKDDGOUY5H  arn:aws:iam::462092780466:user/dogsheep-photos-simon-read                             /       2020-04-18 19:56:54+00:00

from s3-credentials.

simonw avatar simonw commented on September 28, 2024

OK, this is done for list-users and list-buckets and list-bucket.

list-user-policies doesn't output JSON at all, it has a weird custom output - so I'm leaving it for the moment.

from s3-credentials.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.