Git Product home page Git Product logo

s3tk's Introduction

s3tk

A security toolkit for Amazon S3

Another day, another leaky Amazon S3 bucket

β€” The Register, 12 Jul 2017

Don’t be the... next... big... data... leak

Screenshot

🍊 Battle-tested at Instacart

Installation

Run:

pip install s3tk

You can use the AWS CLI or AWS Vault to set up your AWS credentials:

pip install awscli
aws configure

See IAM policies needed for each command.

Commands

Scan

Scan your buckets for:

  • ACL open to public
  • policy open to public
  • public access blocked
  • logging enabled
  • versioning enabled
  • default encryption enabled
s3tk scan

Only run on specific buckets

s3tk scan my-bucket my-bucket-2

Also works with wildcards

s3tk scan "my-bucket*"

Confirm correct log bucket(s) and prefix

s3tk scan --log-bucket my-s3-logs --log-bucket other-region-logs --log-prefix "{bucket}/"

Check CloudTrail object-level logging [experimental]

s3tk scan --object-level-logging

Skip logging, versioning, or default encryption

s3tk scan --skip-logging --skip-versioning --skip-default-encryption

Get email notifications of failures (via SNS)

s3tk scan --sns-topic arn:aws:sns:...

List Policy

List bucket policies

s3tk list-policy

Only run on specific buckets

s3tk list-policy my-bucket my-bucket-2

Show named statements

s3tk list-policy --named

Set Policy

Note: This replaces the previous policy

Only private uploads

s3tk set-policy my-bucket --no-object-acl

Delete Policy

Delete policy

s3tk delete-policy my-bucket

Block Public Access

Block public access on specific buckets

s3tk block-public-access my-bucket my-bucket-2

Use the --dry-run flag to test

Enable Logging

Enable logging on all buckets

s3tk enable-logging --log-bucket my-s3-logs

Only on specific buckets

s3tk enable-logging my-bucket my-bucket-2 --log-bucket my-s3-logs

Set log prefix ({bucket}/ by default)

s3tk enable-logging --log-bucket my-s3-logs --log-prefix "logs/{bucket}/"

Use the --dry-run flag to test

A few notes about logging:

  • buckets with logging already enabled are not updated at all
  • the log bucket must in the same region as the source bucket - run this command multiple times for different regions
  • it can take over an hour for logs to show up

Enable Versioning

Enable versioning on all buckets

s3tk enable-versioning

Only on specific buckets

s3tk enable-versioning my-bucket my-bucket-2

Use the --dry-run flag to test

Enable Default Encryption

Enable default encryption on all buckets

s3tk enable-default-encryption

Only on specific buckets

s3tk enable-default-encryption my-bucket my-bucket-2

This does not encrypt existing objects - use the encrypt command for this

Use the --dry-run flag to test

Scan Object ACL

Scan ACL on all objects in a bucket

s3tk scan-object-acl my-bucket

Only certain objects

s3tk scan-object-acl my-bucket --only "*.pdf"

Except certain objects

s3tk scan-object-acl my-bucket --except "*.jpg"

Reset Object ACL

Reset ACL on all objects in a bucket

s3tk reset-object-acl my-bucket

This makes all objects private. See bucket policies for how to enforce going forward.

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Encrypt

Encrypt all objects in a bucket with server-side encryption

s3tk encrypt my-bucket

Use S3-managed keys by default. For KMS-managed keys, use:

s3tk encrypt my-bucket --kms-key-id arn:aws:kms:...

For customer-provided keys, use:

s3tk encrypt my-bucket --customer-key secret-key

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Note: Objects will lose any custom ACL

Delete Unencrypted Versions

Delete all unencrypted versions of objects in a bucket

s3tk delete-unencrypted-versions my-bucket

For safety, this will not delete any current versions of objects

Use the --dry-run flag to test

Specify certain objects the same way as scan-object-acl

Scan DNS

Scan Route 53 for buckets to make sure you own them

s3tk scan-dns

Otherwise, you may be susceptible to subdomain takeover

Credentials

Credentials can be specified in ~/.aws/credentials or with environment variables. See this guide for an explanation of environment variables.

You can specify a profile to use with:

AWS_PROFILE=your-profile s3tk

IAM Policies

Here are the permissions needed for each command. Only include statements you need.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Scan",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketLogging",
                "s3:GetBucketVersioning",
                "s3:GetEncryptionConfiguration"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ScanObjectLevelLogging",
            "Effect": "Allow",
            "Action": [
                "cloudtrail:ListTrails",
                "cloudtrail:GetTrail",
                "cloudtrail:GetEventSelectors",
                "s3:GetBucketLocation"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ScanDNS",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "route53:ListHostedZones",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ListPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "SetPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:PutBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "DeletePolicy",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteBucketPolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "BlockPublicAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketPublicAccessBlock"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableLogging",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketLogging"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableVersioning",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutBucketVersioning"
            ],
            "Resource": "*"
        },
        {
            "Sid": "EnableDefaultEncryption",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:PutEncryptionConfiguration"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ResetObjectAcl",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObjectAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        },
        {
            "Sid": "Encrypt",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        },
        {
            "Sid": "DeleteUnencryptedVersions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucketVersions",
                "s3:GetObjectVersion",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Access Logs

Amazon Athena is great for querying S3 logs. Create a table (thanks to this post for the table structure) with:

CREATE EXTERNAL TABLE my_bucket (
    bucket_owner string,
    bucket string,
    time string,
    remote_ip string,
    requester string,
    request_id string,
    operation string,
    key string,
    request_verb string,
    request_url string,
    request_proto string,
    status_code string,
    error_code string,
    bytes_sent string,
    object_size string,
    total_time string,
    turn_around_time string,
    referrer string,
    user_agent string,
    version_id string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1',
    'input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\\") ([^ ]*)$'
) LOCATION 's3://my-s3-logs/my-bucket/';

Change the last line to point to your log bucket (and prefix) and query away

SELECT
    date_parse(time, '%d/%b/%Y:%H:%i:%S +0000') AS time,
    request_url,
    remote_ip,
    user_agent
FROM
    my_bucket
WHERE
    requester = '-'
    AND status_code LIKE '2%'
    AND request_url LIKE '/some-keys%'
ORDER BY 1

CloudTrail Logs

Amazon Athena is also great for querying CloudTrail logs. Create a table (thanks to this post for the table structure) with:

CREATE EXTERNAL TABLE cloudtrail_logs (
    eventversion STRING,
    userIdentity STRUCT<
        type:STRING,
        principalid:STRING,
        arn:STRING,
        accountid:STRING,
        invokedby:STRING,
        accesskeyid:STRING,
        userName:String,
        sessioncontext:STRUCT<
            attributes:STRUCT<
                mfaauthenticated:STRING,
                creationdate:STRING>,
            sessionIssuer:STRUCT<
                type:STRING,
                principalId:STRING,
                arn:STRING,
                accountId:STRING,
                userName:STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestId  STRING,
    eventId  STRING,
    resources ARRAY<STRUCT<
        ARN:STRING,
        accountId:STRING,
        type:STRING>>,
    eventType STRING,
    apiVersion  STRING,
    readOnly BOOLEAN,
    recipientAccountId STRING,
    sharedEventID STRING,
    vpcEndpointId STRING,
    requestParameters STRING,
    responseElements STRING,
    additionalEventData STRING,
    serviceEventDetails STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED  AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION  's3://my-cloudtrail-logs/'

Change the last line to point to your CloudTrail log bucket and query away

SELECT
    eventTime,
    eventName,
    userIdentity.userName,
    requestParameters
FROM
    cloudtrail_logs
WHERE
    eventName LIKE '%Bucket%'
ORDER BY 1

Best Practices

Keep things simple and follow the principle of least privilege to reduce the chance of mistakes.

  • Strictly limit who can perform bucket-related operations
  • Avoid mixing objects with different permissions in the same bucket (use a bucket policy to enforce this)
  • Don’t specify public read permissions on a bucket level (no GetObject in bucket policy)
  • Monitor configuration frequently for changes

Bucket Policies

Only private uploads

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObjectAcl",
            "Resource": "arn:aws:s3:::my-bucket/*"
        }
    ]
}

Performance

For commands that iterate over bucket objects (scan-object-acl, reset-object-acl, encrypt, and delete-unencrypted-versions), run s3tk on an EC2 server for minimum latency.

Notes

The set-policy, block-public-access, enable-logging, enable-versioning, and enable-default-encryption commands are provided for convenience. We recommend Terraform for managing your buckets.

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-bucket"
  acl    = "private"

  logging {
    target_bucket = "my-s3-logs"
    target_prefix = "my-bucket/"
  }

  versioning {
    enabled = true
  }
}

resource "aws_s3_bucket_public_access_block" "my_bucket" {
  bucket = "${aws_s3_bucket.my_bucket.id}"

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Upgrading

Run:

pip install s3tk --upgrade

To use master, run:

pip install git+https://github.com/ankane/s3tk.git --upgrade

Docker

Run:

docker run -it ankane/s3tk aws configure

Commit your credentials:

docker commit $(docker ps -l -q) my-s3tk

And run:

docker run -it my-s3tk s3tk scan

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/s3tk.git
cd s3tk
pip install -r requirements.txt

s3tk's People

Contributors

ankane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

s3tk's Issues

scan-object-acl fails with traceback

This looks like a really useful tool, thanks for writing and maintaining it!

I built a Docker image from a Git checkout and got this on scan-object-acl, for a bucket that exists:

$ docker run  -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY s3tk s3tk scan-object-acl foobucket

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/externals/loky/process_executor.py", line 420, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/_parallel_backends.py", line 563, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/parallel.py", line 261, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/parallel.py", line 261, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.7/site-packages/s3tk-0.2.0-py3.7.egg/s3tk/__init__.py", line 172, in scan_object
    obj = s3.Object(bucket_name, key)
NameError: name 's3' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/s3tk", line 4, in <module>
    __import__('pkg_resources').run_script('s3tk==0.2.0', 's3tk')
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 661, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.7/site-packages/s3tk-0.2.0-py3.7.egg/EGG-INFO/scripts/s3tk", line 7, in <module>
    s3tk.cli()
  File "/usr/local/lib/python3.7/site-packages/Click-7.0-py3.7.egg/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/Click-7.0-py3.7.egg/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/Click-7.0-py3.7.egg/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/Click-7.0-py3.7.egg/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/Click-7.0-py3.7.egg/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/s3tk-0.2.0-py3.7.egg/s3tk/__init__.py", line 489, in scan_object_acl
    summarize(parallelize(bucket, only, _except, scan_object))
  File "/usr/local/lib/python3.7/site-packages/s3tk-0.2.0-py3.7.egg/s3tk/__init__.py", line 267, in parallelize
    return Parallel(n_jobs=24)(delayed(fn)(bucket.name, os.key, *args) for os in objects if object_matches(os.key, only, _except))
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/parallel.py", line 996, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/parallel.py", line 899, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python3.7/site-packages/joblib-0.12.5-py3.7.egg/joblib/_parallel_backends.py", line 517, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
NameError: name 's3' is not defined

Other commands do work OK:

$ docker run  -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY s3tk s3tk scan foobucket
foobucket
  βœ” ACL not open to public
  βœ” Policy not open to public
  ✘ Logging disabled
  βœ” Versioning enabled
  ✘ Default encryption disabled

The credentials I am using have admin access.

I had the same error with a virtualenv using brew Python 3.7.0 on Mac.

s3tk scan --object-level-logging GetEventSelectors trail name

Got this error when running s3tk scan --object-level-logging :

An error occurred (TrailNotFoundException) when calling the GetEventSelectors operation: Unknown trail: arn:aws:cloudtrail:us-west-2:xxxxxx:trail/yyyyyyy for the user: xxxxxxxxx

I Think GetEventSelectors is expecting the short trail name of yyyyyyy rather than the full ARN.

Edit: Looks like this issue was caused by the cloudtrail being in a different region than the default region set via aws configure. The above cloudtrail exists in us-east-1.

Edit: Typo's/Formatting

Cannot import s3tk

Hi,

We should be able to directly import s3tk and skip cli interaction like:

>> from s3tk import scan
>> scan(['my_bucket'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/s3tk/__init__.py", line 389, in scan
    for bucket in fetch_buckets(buckets):
  File "/usr/local/lib/python2.7/site-packages/s3tk/__init__.py", line 82, in fetch_buckets
    return [s3.Bucket(bn) for bn in buckets]
NameError: global name 's3' is not defined

(Edit) Current workaround

import boto3
import s3tk
s3tk.s3 = boto3.resource('s3')
s3tk.scan(['my_bucket'])

Follow up Issue

Another issue is that process is exiting at the end of the command: https://github.com/ankane/s3tk/blob/master/s3tk/__init__.py#L421

Regards

Docker image

Would you consider wrapping this repository up as a public Docker image than can be run without requiring all of the dependencies (including Python itself)?

grepable output?

While colours are pretty, I can't grep them. Is there an option, or will there be an option to produce an output that's easier to parse for errors on a large set of buckets?

s3tk scan --object-level-logging error

Ran the s3tk scan with the --object-level-logging flag and received the following error:

Traceback (most recent call last):
  File "/usr/local/bin/s3tk", line 7, in <module>
    s3tk.cli()
  File "/usr/lib/python2.7/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python2.7/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python2.7/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python2.7/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/s3tk/__init__.py", line 440, in scan
    event_selectors = fetch_event_selectors() if object_level_logging else {}
  File "/usr/local/lib/python2.7/dist-packages/s3tk/__init__.py", line 408, in fetch_event_selectors
    if trail['IsMultiRegionTrail']:
KeyError: 'IsMultiRegionTrail'

use with multiple accounts

hi
how can i use this with multiple accountes found in ~/.aws/credentials ?
in aws cli its simply --profile=PROFILENAME

CSV output

Hi,

It would be great to have the output all on one line per bucket, with the fields CSV

e.g.

someransdombucket,ACL not open to public,Policy not open to public,Logging disabled,Versioning disabled

  • Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.