I just spotted list-buckets has the same

The current documented display options are: Add <code c

Good code to imitate from sqlite-utils : <a href="http

This outputs 2-indented JSON in a streaming fashion: <div class="highlight highlig

Turned that into a TIL: <a href="https://til.simonwillison.net/python/output-json-arra

Standard default output should be a valid JSON array about s3-credentials HOT 15 CLOSED

simonw commented on September 28, 2024

Standard default output should be a valid JSON array

from s3-credentials.

Comments (15)

simonw commented on September 28, 2024

The current documented display options are:

Add --nl to collapse these to single lines as valid newline-delimited JSON.

Add --array to output a valid JSON array of objects instead.

Defaulting to pretty-printed invalid newline-delimited JSON was a weird design choice that I made!

% s3-credentials list-buckets
{
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
}
{
    "Name": "simonw-test-bucket-for-s3-credentials",
    "CreationDate": "2021-11-03 21:46:12+00:00"
}

Fixing this would require a major version bump if I had hit 1.0 already.

from s3-credentials.

simonw commented on September 28, 2024

In that case I think the safest default would be to turn the above default into a pretty-printed streaming JSON array:

% s3-credentials list-buckets
[
  {
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
  },
  {
    "Name": "simonw-test-bucket-for-s3-credentials",
    "CreationDate": "2021-11-03 21:46:12+00:00"
  }
]

Then drop the --array option but keep --nl, which would output this (already implemented):

% s3-credentials list-buckets
{"Name": "aws-cloudtrail-logs-462092780466-f2c900d3", "CreationDate": "2021-03-25 22:19:54+00:00"}
{"Name": "simonw-test-bucket-for-s3-credentials", "CreationDate": "2021-11-03 21:46:12+00:00"}

And add a new --csv option.

from s3-credentials.

simonw commented on September 28, 2024

Getting this right will mean I can pipe into sqlite-utils insert easily to create a SQLite database, which would be fun.

Actually this works already:

s3-credentials list-buckets --nl | sqlite-utils insert /tmp/s3.db buckets - --nl

from s3-credentials.

simonw commented on September 28, 2024

Here's how the current list-buckets implementation works:

s3-credentials/s3_credentials/cli.py

Lines 556 to 564 in aa69024

 if array: 

 gathered.append(bucket) 

 else: 

 if nl: 

 click.echo(json.dumps(bucket, default=str)) 

 else: 

 click.echo(json.dumps(bucket, indent=4, default=str)) 

 if gathered: 

 click.echo(json.dumps(gathered, indent=4, default=str))

I need a new abstraction I can call that knows how to turn an iterator of rows into one of the desired formats.

from s3-credentials.

simonw commented on September 28, 2024

Most interesting new implementation here will be the code that knows how to output something like this in a streaming fashion, without buffering it all in an array first:

[
  {
    "Name": "aws-cloudtrail-logs-462092780466-f2c900d3",
    "CreationDate": "2021-03-25 22:19:54+00:00"
  },
  {
    "Name": "aws-sam-cli-managed-default-samclisourcebucket-1ksajo4h62s07",
    "CreationDate": "2020-06-16 23:13:34+00:00"
  },
  {
    "Name": "blah-bucket-blah",
    "CreationDate": "2021-11-10 23:50:08+00:00"
  }
]

Trick will be to output [ at the start, then two-space indented (with textwrap) json.dumps(..., indent=2) rows with commas after each except the last one - and then a ] at the end.

from s3-credentials.

simonw commented on September 28, 2024

Good code to imitate from sqlite-utils: https://github.com/simonw/sqlite-utils/blob/74586d3cb26fa3cc3412721985ecdc1864c2a31d/sqlite_utils/cli.py#L1589-L1623 - in particular this CSV/TSV bit:

writer = csv_std.writer(sys.stdout, dialect="excel-tab" if tsv else "excel")
writer.writerow(headers)
for row in cursor:
    writer.writerow(row)

Also https://github.com/simonw/sqlite-utils/blob/74586d3cb26fa3cc3412721985ecdc1864c2a31d/sqlite_utils/cli.py#L2489-L2514

def output_rows(iterator, headers, nl, arrays, json_cols):
    # We have to iterate two-at-a-time so we can know if we
    # should output a trailing comma or if we have reached
    # the last row.
    current_iter, next_iter = itertools.tee(iterator, 2)
    next(next_iter, None)
    first = True
    for row, next_row in itertools.zip_longest(current_iter, next_iter):
        is_last = next_row is None
        data = row
        if json_cols:
            # Any value that is a valid JSON string should be treated as JSON
            data = [maybe_json(value) for value in data]
        if not arrays:
            data = dict(zip(headers, data))
        line = "{firstchar}{serialized}{maybecomma}{lastchar}".format(
            firstchar=("[" if first else " ") if not nl else "",
            serialized=json.dumps(data, default=json_binary),
            maybecomma="," if (not nl and not is_last) else "",
            lastchar="]" if (is_last and not nl) else "",
        )
        yield line
        first = False
    if first:
        # We didn't output any rows, so yield the empty list
        yield "[]"

from s3-credentials.

simonw commented on September 28, 2024

Actually I probably want to use csv.DictWriter here:

writer = csv.DictWriter(sys.stdout, headers)
dict_writer.writeheader()
dict_writer.writerows(iterator_of_dictionaries)

from s3-credentials.

simonw commented on September 28, 2024

This outputs 2-indented JSON in a streaming fashion:

def output_rows_json(iterator):
    # We have to iterate two-at-a-time so we can know if we
    # should output a trailing comma or if we have reached
    # the last row.
    current_iter, next_iter = itertools.tee(iterator, 2)
    next(next_iter, None)
    first = True
    for row, next_row in itertools.zip_longest(current_iter, next_iter):
        is_last = next_row is None
        data = row
        line = "{firstchar}{serialized}{maybecomma}{lastchar}".format(
            firstchar="[\n" if first else "",
            serialized=textwrap.indent(json.dumps(data, indent=2, default=repr), '  '),
            maybecomma="," if not is_last else "",
            lastchar="\n]" if is_last else "",
        )
        yield line
        first = False
    if first:
        # We didn't output any rows, so yield the empty list
        yield "[]"

Demo:

print("\n".join(output_rows_json([{"id": 1, "name": "Simon"}, {"id": 2, "name": "Cleo"}, {"id": 3, "name": "Azi"}])))

[
  {
    "id": 1,
    "name": "Simon"
  },
  {
    "id": 2,
    "name": "Cleo"
  },
  {
    "id": 3,
    "name": "Azi"
  }
]

from s3-credentials.

simonw commented on September 28, 2024

Turned that into a TIL: https://til.simonwillison.net/python/output-json-array-streaming

from s3-credentials.

simonw commented on September 28, 2024

Ran into a problem applying this to list-users:

% s3-credentials list-users --csv
Path,UserName,UserId,Arn,CreateDate
... many rows follow ...
Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/bin/s3-credentials", line 33, in <module>
    sys.exit(load_entry_point('s3-credentials', 'console_scripts', 's3-credentials')())
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/share/virtualenvs/s3-credentials-J8M1ChYK/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/Dropbox/Development/s3-credentials/s3_credentials/cli.py", line 495, in list_users
    output(iterate(), nl, csv, tsv)
  File "/Users/simon/Dropbox/Development/s3-credentials/s3_credentials/cli.py", line 789, in output
    writer.writerows(itertools.chain([first], iterator))
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/Users/simon/.pyenv/versions/3.10.0/lib/python3.10/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'PasswordLastUsed'

CSV output failed because one of the later rows had a new unexpected column.

from s3-credentials.

simonw commented on September 28, 2024

Options for fixing this:

Silently ignore columns that were not in the first record. Easiest fix.
Watch out for these warnings and show them at the end, after ignoring them while outputting stuff. Bit ugly.
For CSV mode load everything into memory first to check for the maximum set of headers. This breaks the goal of having this work efficiently with the streamed data.
Figure out the full set of possible columns and hard-code that into the application. Probably the best solution?

from s3-credentials.

simonw commented on September 28, 2024

I considered an option where it spots the error, runs to the end to capture all possible headers, then runs the entire command again - but that wouldn't work because we would already have outputted headers and previous rows to stdout.

from s3-credentials.

simonw commented on September 28, 2024

I'm going to hard-code in the list of known columns. This also gives me control over the order in which they are output.

For list-users that's https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam.html#IAM.Client.list_users

UserName
UserId
Arn
Path
CreateDate
PasswordLastUsed
PermissionsBoundary
Tags

from s3-credentials.

simonw commented on September 28, 2024

Fun trick with this:

% s3-credentials list-users --tsv | sqlite-utils memory stdin:tsv 'select * from stdin' -t
UserName                                               UserId                 Arn                                                                                   Path    CreateDate                 PasswordLastUsed           PermissionsBoundary    Tags
-----------------------------------------------------  ---------------------  ------------------------------------------------------------------------------------  ------  -------------------------  -------------------------  ---------------------  ------
custom-policy                                          AIDAWXFXAIOZNQQMEOHUA  arn:aws:iam::462092780466:user/custom-policy                                          /       2021-11-03 18:31:22+00:00
dogsheep-photos-simon-read                             AIDAWXFXAIOZKDDGOUY5H  arn:aws:iam::462092780466:user/dogsheep-photos-simon-read                             /       2020-04-18 19:56:54+00:00

from s3-credentials.

simonw commented on September 28, 2024

OK, this is done for list-users and list-buckets and list-bucket.

list-user-policies doesn't output JSON at all, it has a weird custom output - so I'm leaving it for the moment.

from s3-credentials.

Standard default output should be a valid JSON array about s3-credentials HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if array:
	gathered.append(bucket)
	else:
	if nl:
	click.echo(json.dumps(bucket, default=str))
	else:
	click.echo(json.dumps(bucket, indent=4, default=str))
	if gathered:
	click.echo(json.dumps(gathered, indent=4, default=str))