duo-labs / cloudtracker Goto Github PK

View Code? Open in Web Editor NEW

876.0 876.0 108.0 162 KB

CloudTracker helps you find over-privileged IAM users and roles by comparing CloudTrail logs with current IAM policies.

License: BSD 3-Clause "New" or "Revised" License

Python 87.03% Lua 12.82% Shell 0.15%

cloudtracker's People

Contributors

Stargazers

Watchers

Forkers

danielpops avaudioplayer serkanh 0xdabbad00 houey codegrande wizard2773 shdobxr leafsheep ivosandoval aretusa ryandeivert vvkram vvalorous jbrendel poflynn ibuystuff ik-cloud-iam kubeshield yield65 deanlj skyionblue jmhale vkas288 uiowa-cloud kevin-kariuki sessin github85 w1ck3dth1ngs melvinvarkey mrengler t2b enayet01 cogentcloud robertdavis1 irivera007 mechcozmo lazerhawk kbroughton santoshkatageri yehudacohen mbaciu-gpsw acumenix modulexcite opt9 thealgirdas andywickersham cormon1976 kirankondabolu jcaxmacher avineshwar adewealthgit ritzdan 0x13337 thebigplate tolidano dan-cleinmark aelmadho lobobinario kmcquade fruechel nvhoang eznj alexkayabula jpdro rjackson-dev-ops ell-max zyb2n confickerp croissantamente dwtcourses somethingnotaswitty samuelriesz radioactivetobi isaacbentley davidhessler cyc115 elprofesor123 frap randomguy4444 5l1v3r1 voidp34r bnugent gsatyadev tkuennen yougoglencoco superissy dbmundada franklinharry herootx daveyshmave helenfeng737 tmickle1 mdaue2 davekassel isabella232 gregorycoleman andrew-kline prakashshettysi cloudsecdb

cloudtracker's Issues

Feature requests

Have a few feature requests that will make my life easier and not have me refactor/rewrite the code.

If some or all of these are already covered by some existing functionality, I apologize for missing it.

1. Ability to modify the CloudTrail log path mentioned in athena.py

cloudtrail_log_path = 's3://{bucket}/{path}/AWSLogs/{account_id}/CloudTrail'

Reasoning: My CloudTrail log path may not always be AWSLogs/{account_id}/CloudTrail

2. Ability to handle scenario wherein an organization may have multiple accounts and CloudTrail logs are stored in a centralized manner in a particular account

For example, we have three accounts account_1, account_2, and account_3. account_1 and account_2 are my prod & dev accounts respectively. account_3 is my monitoring account where I have a custom Lambda for pulling in CloudTrail logs from account_1 and account_2 and storing in an S3 bucket in account_3.

When running queries, cloudtracker would need to interact with both the target account (prod or dev) and account_3.

3. Ability to generate reports in json/csv format

This will help feed results into tools like Splunk.

4. Ability to provide a --profile argument while executing cloudtracker

Reasoning: My AWS credentials may be stored in a profile other than default

request: make cloudtracker pip installable

Migrating setup/installation to setuptools would enable easier use of cloudtracker and would also allow for publishing to pypi.

Allow for Organization created trails

Normal CloudTrail logs are stored in:

s3://my_log_bucket/OPTIONAL_PREFIX/AWSLogs/111111111111/CloudTrail/

Organization CloudTrail logs are stored in:

s3://my_log_bucket/OPTIONAL_PREFIX/AWSLogs/o-ORGANIZATION_ID/111111111111/CloudTrail/

I need to account for that o-ORGANIZATION_ID sub-directory.

One method may be to have in the config.yaml the full path to the CloudTrail logs, such as:

accounts:
  - name: demo
    id: 111111111111
    iam: account-data/demo_iam.json
    cloudtrail_logs: s3://my_log_bucket/OPTIONAL_PREFIX/AWSLogs/o-ORGANIZATION_ID/111111111111/CloudTrail/
``

cloudwatch:putmetricdata not showing as unknown

I was informed that CloudTracker was showing a - next to cloudwatch:putmetricdata. I'm guessing this is a result of #58. I need to check what happened there. I believe also that this should be reported as events:putmetricdata.

Usage of --start option

Hey Scott,

Thanks for the amazing work, as usual.

Do you confirm that if I'm using :
cloudtracker --account myaccount --user myuser --show-used --start 2019-04-25

The --start option is not used?

I was trying to identify the last used actions on a specific user after a specific date. (forensic, and least privilege building for a new policy)

Thanks,

Privileges::determine_allowed should consider resources

I think there's a bug here:

    def determine_allowed(self):
        [...]
        # Look at denied
        for stmt in self.stmts:
            if stmt['Effect'] == 'Deny':
                stmt_actions = self.get_actions_from_statement(stmt)
                for action in stmt_actions:
                    if action in actions:
                        del actions[action]

Consider the following policy statements, which gives:

Full access (except for CreateBucket) to most buckets
No access to one particular bucket

"Statement": [
                {
                    "Action": "s3:*",
                    "Effect": "Allow",
                    "Resource": "*"
                },
                {
                    "Action": "s3:CreateBucket",
                    "Effect": "Deny",
                    "Resource": "*"
                },
                {
                    "Action": "s3:*",
                    "Effect": "Deny",
                    "Resource": [
                        "arn:aws:s3:::super-sensitive-bucket",
                        "arn:aws:s3:::super-sensitive-bucket/*"
                    ]
                }

Expected:
The list of allowed actions should contain everything except s3:CreateBucket.
Actual:
The list of allowed actions is empty.

A naive solution could be to only delete the action key if the resource is * (or maybe something like it... like s3://*).

Add support for inline group policies

Currently cloudtracker is only looking at Managed Policies attached to a user's groups, but it should also look at Inline Policies.

Note that these are returned by separate APIs:

Inline policy:

$ aws iam list-group-policies --group-name Test-Group
{
    "PolicyNames": [
        "test-inline-policy-document"
    ]
}

Managed Policy (either AWS-defined or customer-defined):

$ aws iam list-attached-group-policies --group-name Test-Group
{
    "AttachedPolicies": [
        {
            "PolicyName": "AmazonEC2FullAccess",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
        },
        {
            "PolicyName": "Test-Managed-EC2-Full-Access",
            "PolicyArn": "arn:aws:iam::1234567891:policy/Test-Managed-EC2-Full-Access"
        }
    ]
}

In my setup, I rely heavily on inline policies, so cloudtracker thinks I have zero permissions in the account, which is not correct.

Add Support For "NotAction" IAM Policy Clause

It appears that cloudtracker doesn't enumerate the full list of granted permissions for both users and roles that utilize the "NotAction" clause.

Example
IAM Policy:

        {
            "Sid": "AllowAllOperationsExceptIamAndCloudTrail",
            "Effect": "Allow",
            "Resource": "*",
            "NotAction": [
                "iam:*",
                "cloudtrail:*"
            ]
        },

Cloudtracker output for this role shows only the permissions granted by other policies that use the "Action" clause, with a large number of services noting the "+" designation that were used via this policy.

This issue can lead to inaccurate results and missed permissions when using the tool.

cloudtracker

Hi, I wonder if someone can help me, please. I install cloudtracker on my Linux OS, I configured following the step on GitHub, but for some reason is not working, and I wonder what is the command, or additional configuration I may miss.

Yours sincerely

Automatically report on every principal

Currently you have to rerun CloudTracker over and over again for every user and role. I should be able to just dump a report for all of them.

Redo IAM <-> API translation

I've learned a lot more about IAM vs API naming since the initial development of CloudTracker and recorded those here: https://summitroute.com/blog/2018/06/28/aws_iam_vs_api_vs_cloudtrail/

I should download the list of IAM privileges from the Policy Generator and the list of API calls and make a giant dictionary. Additionally Will Bengtson has mentioned to me he has a way of generating the CloudTrail logs for all of the calls to ensure the naming is accurate between all 3 places. This should probably just look like:

{
  api: '',
  cloudtrail: '',
  iam: '',
  data = False
}

Where data would mean whether or not you need data level logging turned on.

Show which resource was used

Do we have any details beyond the simple action that's needed or not needed? Is there a way to see, for example, that putobject was needed, but not for ALL resources? Or that it was only used for a subset of the resources available? Can we show unused actions as a tuple of the action AND the resource it applies to, in case it's used for one resource but unused for another?

Document which elasticsearch versions are supported

The query language semantics change between the major versions of elasticsearch. Seems that the version I'm testing against is not compatible with cloudtracker. Seems that elasticsearch 2.0 is a minimum requirement, based on the error I observed. Is that correct? (or maybe it's 6.0, based on requirements.txt contents). It would be helpful to include a "compatibility matrix" in the README that shows what versions are known to work and when you're in uncharted territories.

Bonus points for automagically supporting all major versions :)

Avoid installing ElasticSearch libraries for those that only use Athena

Now that I have Athena support, I'd like to avoid installing the ElasticSearch libraries as they are not needed. The setup process for CloudTracker when using ElasticSearch involves a Makefile, making this issue more complicated.

"Query entered state FAILED with reason HIVE_CURSOR_ERROR: Please reduce your request rate."

Looks like it is hitting an S3 API call limit which I realize is not a Cloudtracker bug per se but wondering if anyone else has hit this and if there's a workaround?

Thanks,

Paul

INFO     Checking if all partitions for the past 12 months exist
Traceback (most recent call last):
  File "/Users/paul.oflynn/venv/bin/cloudtracker", line 11, in <module>
    load_entry_point('cloudtracker==2.1.2', 'console_scripts', 'cloudtracker')()
  File "/Users/paul.oflynn/venv/lib/python3.7/site-packages/cloudtracker/cli.py", line 104, in main
    run(args, config, args.start, args.end)
  File "/Users/paul.oflynn/venv/lib/python3.7/site-packages/cloudtracker/__init__.py", line 443, in run
    performed_actors = datasource.get_performed_users()
  File "/Users/paul.oflynn/venv/lib/python3.7/site-packages/cloudtracker/datasources/athena.py", line 317, in get_performed_users
    response = self.query_athena(query)
  File "/Users/paul.oflynn/venv/lib/python3.7/site-packages/cloudtracker/datasources/athena.py", line 74, in query_athena
    self.wait_for_query_to_complete(response['QueryExecutionId'])
  File "/Users/paul.oflynn/venv/lib/python3.7/site-packages/cloudtracker/datasources/athena.py", line 113, in wait_for_query_to_complete
    reason=response['QueryExecution']['Status']['StateChangeReason']))
Exception: Query entered state FAILED with reason HIVE_CURSOR_ERROR: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: AA42SNIP3D63702; S3 Extended Request ID: zISuyGbB2MtV84gcJwjpvoSNIPI5WyUM+C8Ln+XcxSNIPgmL3Jsmi/EJYYFQdW9s=)
(venv) cloud-tracker $

“python_requires” should be set with “>=3”, as cloudtracker 2.1.5 is not compatible with all Python versions.

Currently, the keyword argument python_requires of setup() is not set, and thus it is assumed that this distribution is compatible with all Python versions.
However, I found it is not compatible with Python2. My local Python version is 2.7, and I encounter the following error when executing “pip install cloudtracker”

Collecting cloudtracker
  Using cached cloudtracker-2.1.5.tar.gz (80 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-wbT82F/cloudtracker/setup.py'"'"'; __file__='"'"'/tmp/pip-install-wbT82F/cloudtracker/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-wbT82F/cloudtracker/pip-egg-info
         cwd: /tmp/pip-install-wbT82F/cloudtracker/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-wbT82F/cloudtracker/setup.py", line 33, in <module>
        long_description=get_description(),
      File "/tmp/pip-install-wbT82F/cloudtracker/setup.py", line 22, in get_description
        return open(os.path.join(os.path.abspath(HERE), 'README.md'), encoding='utf-8').read()
    TypeError: 'encoding' is an invalid keyword argument for this function
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I found that setup.py used a Python3-specific keyword argument encoding for the function open, which lead to installation failure of cloudtracker in Python2.

Way to fix:
modify setup() in setup.py, add python_requires keyword argument:

setup(…
     python_requires='>=3',
     …)

Thanks for your attention.
Best regrads,
PyVCEchecker

Make Athena support work with Federated users

Document pypi updates

Updates are done with

python setup.py sdist
twine upload --repository pypi dist/*

Update pyyaml to resolve security alert

Github filed an alert due to the version of pyyaml I'm using. The only place yaml is loaded should be the config file that they create themselves, so there should be no impact, but I'll update this this evening.

FileNotFoundError while running

Getting following error while running "python cloudtracker.py --account demo --list users" command.

FileNotFoundError: [Errno 2] No such file or directory: 'account-data/demo-iam.json'

I have double checked and the demo-iam.json file exists under account-data. Can you please comment/check?

unable to execute cloudtracker commands

(venv) ➜ ~ cloudtracker --account demo --list users
Traceback (most recent call last):
File "/Users/apple/venv/bin/cloudtracker", line 8, in
sys.exit(main())
File "/Users/apple/venv/lib/python3.10/site-packages/cloudtracker/cli.py", line 97, in main
config = yaml.load(args.config)
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/init.py", line 72, in load
return loader.get_single_data()
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/constructor.py", line 37, in get_single_data
return self.construct_document(node)
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/constructor.py", line 46, in construct_document
for dummy in generator:
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
value = self.construct_mapping(node)
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/constructor.py", line 204, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "/Users/apple/venv/lib/python3.10/site-packages/yaml/constructor.py", line 126, in construct_mapping
if not isinstance(key, collections.Hashable):
AttributeError: module 'collections' has no attribute 'Hashable'
(venv) ➜ ~

When i tried to run the cloudtracker commands the above error encounter's

request: remove pyjq dependency

Being dependent on pyjq makes setup/installation more difficult than it needs to be. I'd suggest migrating to jmespath, which has a syntax similar to jq but whose python package does not have the extra installation requirements.

Reduce privilege list by globbing

Instead of listing every privilege individually, I should glob the privileges, so only display ec2:* instead of every EC2 privilege. Netflix's policyuniverse has functionality that may help with this.

Looking for a way to comb through organization CloudTrail

Linking the previous issue as well: #46

[Cloudtracker] JSONDecodeError: Expecting value: line 1 column 1

Hi there, I'm really excited to try out Cloudtracker! I've completed the setup config steps, but I've ran into the following error message when running my first command (cloudtracker --account demo --list users):

Python version: 3.6.9
Ubuntu version: 18.04

(venv) ~$ cloudtracker --account demo --list users --start 2020-06-01
  
/home/username/venv/lib/python3.6/site-packages/cloudtracker/cli.py:97: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  
  config = yaml.load(args.config)
INFO     Source of CloudTrail logs: s3://cloudtrail-bucket/
INFO     Using AWS identity: arn:aws:iam::339389476548:user/my_iam_user
INFO     Using output bucket: s3://aws-athena-query-results-account-number-us-west-2
INFO     Account cloudtrail log path: s3://cloudtrail-bucket/AWSLogs/account-number/CloudTrail
INFO     Checking if all partitions for the past 12 months exist
INFO     Partition groups remaining to create: 12
INFO     Partition groups remaining to create: 11
INFO     Partition groups remaining to create: 10
INFO     Partition groups remaining to create: 9
INFO     Partition groups remaining to create: 8
INFO     Partition groups remaining to create: 7
INFO     Partition groups remaining to create: 6
INFO     Partition groups remaining to create: 5
INFO     Partition groups remaining to create: 4
INFO     Partition groups remaining to create: 3
INFO     Partition groups remaining to create: 2
INFO     Partition groups remaining to create: 1
  
Traceback (most recent call last):
  File "/home/username/venv/bin/cloudtracker", line 11, in <module>
    sys.exit(main())
  File "/home/username/venv/lib/python3.6/site-packages/cloudtracker/cli.py", line 104, in main
    run(args, config, args.start, args.end)
  File "/home/username/venv/lib/python3.6/site-packages/cloudtracker/__init__.py", line 436, in run
    account_iam = get_account_iam(account)
  File "/home/username/venv/lib/python3.6/site-packages/cloudtracker/__init__.py", line 162, in get_account_iam
    return json.load(open(account['iam']))
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError:   ( (char 0)

This is using my default AWS CLI user, with the appropriate IAM permissions assigned to that access key.

Thanks!

Deprecate ElasticSearch support

Although the use of ElasticSearch is very flexible, faster, and cheaper than Athena for people that already have it, it is not popular and it would be better to focus on Athena support.

Add option to specify the S3 data is encrypted

The data might be encrypted, so you'll have to specify 'has_encrypted_data'='true' in the TBLPROPERTIES. I don't want too many of these recent issues I've created for creating the Athena tables, but some common situations are reasonable to handle.

Provide cleanup ability

After use CloudTracker with Athena support it creates a database in Athena. This doesn't cost any money, but it should be possible to have CloudTracker clean up after itself via something like python cloudtracker.py cleanup. After every query, Athena will also create objects in a result S3 bucket. These do cost a tiny bit of money and should be cleaned up after every run.

Use policyuniverse

Use https://github.com/netflix-skunkworks/policyuniverse instead of https://github.com/duo-labs/cloudtracker/blob/master/cloudtracker/__init__.py#L80 and aws_api_list.txt. This would also support NotAction (

cloudtracker/cloudtracker/__init__.py

Line 69 in 33852a6

# TODO Implement NotAction

). This could also help with the --ignore-benign flag to more accurately identify benign actions beyond List* and Describe*.

Need to push changes to that project to support some of CloudTracker's needs.

bug w/ python 3.9

There appears to be a bug w/ python 3.9

Relevant info: pipx 1.1.0, python 3.9.6, mac

workaround: using python 3.7

$ cloudtracker --account legacy --list users
INFO     Source of CloudTrail logs: s3://aws-cloudtrail-logs-111111111111-test/
INFO     Using AWS identity: arn:aws:iam::111111111111:user/[email protected]
INFO     Using output bucket: s3://aws-athena-query-results-111111111111-us-east-1
INFO     Account cloudtrail log path: s3://aws-cloudtrail-logs-111111111111-test//AWSLogs/111111111111/CloudTrail 

debug:
bucket is aws-cloudtrail-logs-111111111111-test
path is 
path is null? False

Traceback (most recent call last):
  File "/Users/almenon/.local/bin/cloudtracker", line 8, in <module>
    sys.exit(main())
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/cloudtracker/cli.py", line 104, in main
    run(args, config, args.start, args.end)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/cloudtracker/__init__.py", line 421, in run
    datasource = Athena(config['athena'], account, start, end, args)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/cloudtracker/datasources/athena.py", line 211, in __init__
    resp = self.s3.list_objects_v2(Bucket=config['s3_bucket'], Prefix=config['path'], MaxKeys=1)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/client.py", line 324, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/client.py", line 608, in _make_api_call
    http, parsed_response = self._endpoint.make_request(
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/endpoint.py", line 143, in make_request
    return self._send_request(request_dict, operation_model)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/endpoint.py", line 169, in _send_request
    success_response, exception = self._get_response(
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/endpoint.py", line 247, in _get_response
    parsed_response = parser.parse(
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/parsers.py", line 210, in parse
    parsed = self._do_error_parse(response, shape)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/parsers.py", line 750, in _do_error_parse
    return self._parse_error_from_body(response)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/parsers.py", line 774, in _parse_error_from_body
    self._replace_nodes(parsed)
  File "/Users/almenon/.local/pipx/venvs/cloudtracker/lib/python3.9/site-packages/botocore/parsers.py", line 395, in _replace_nodes
    if value.getchildren():
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'

Support custom cloudtrail namespaces

We already ingest cloudtrail data to an elasticsearch cluster, but we have the data indexed in a nested dictionary structure for namespacing reasons.

For example, if a cloudtrail json blob looks roughly like:

{
  "eventVersion": "1.0",
  "userIdentity": {
    "type": "IAMUser",
    "principalId": "EX_PRINCIPAL_ID",
    "arn": "arn:aws:iam::123456789012:user/Alice",
    "accessKeyId": "EXAMPLE_KEY_ID",
    "accountId": "123456789012",
    "userName": "Alice"
  },
  "eventTime": "2014-03-06T21:22:54Z",
  "eventSource": "ec2.amazonaws.com",
  "eventName": "StartInstances",
  "awsRegion": "us-east-2",
  "sourceIPAddress": "205.251.233.176",
  "userAgent": "ec2-api-tools 1.6.12.2",
  "requestParameters": {
    "instancesSet": {
      "items": [
        {
          "instanceId": "i-ebeaf9e2"
        }
      ]
    }
  },
  "responseElements": {
    "instancesSet": {
      "items": [
        {
          "instanceId": "i-ebeaf9e2",
          "currentState": {
            "code": 0,
            "name": "pending"
          },
          "previousState": {
            "code": 80,
            "name": "stopped"
          }
        }
      ]
    }
  }
}

It would actually look like:

{
  "the_cloudtrail_data": {
    "eventVersion": "1.0",
    "userIdentity": {
      "type": "IAMUser",
      "principalId": "EX_PRINCIPAL_ID",
      "arn": "arn:aws:iam::123456789012:user/Alice",
      "accessKeyId": "EXAMPLE_KEY_ID",
      "accountId": "123456789012",
      "userName": "Alice"
    },
    "eventTime": "2014-03-06T21:22:54Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "StartInstances",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.176",
    "userAgent": "ec2-api-tools 1.6.12.2",
    "requestParameters": {
      "instancesSet": {
        "items": [
          {
            "instanceId": "i-ebeaf9e2"
          }
        ]
      }
    },
    "responseElements": {
      "instancesSet": {
        "items": [
          {
            "instanceId": "i-ebeaf9e2",
            "currentState": {
              "code": 0,
              "name": "pending"
            },
            "previousState": {
              "code": 80,
              "name": "stopped"
            }
          }
        ]
      }
    }
  }
}

I think cloudtracker could expose the concept of a key_prefix or key_namespace or something, to account for this type of setup

cloudtrail_supported_actions contains actions that are not actually logged to cloudtrail

At least, sqs:ReceiveMessage (and various other sqs APIs) do not actually appear in cloudtrail. The only sqs related ones I actually see in cloudtrail across several accounts over the past 30 days are sqs:CreateQueue, sqs:DeleteQueue, sqs:PurgeQueue, and sqs:SetQueueAttributes.

I imagine there are others that are incorrectly present in this list.

I know that this list was updated ~3 months ago. Maybe the procedure for generating/verifying that list's accuracy could be improved or revisited?

Getting tracevback when installing pip install -r requirements.txt

@0xdabbad00 I am using linux server.Kindly help me to get it sought it out
Exception:
Traceback (most recent call last):
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/commands/install.py", line 365, in run
strip_file_prefix=options.strip_file_prefix,
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/req/req_install.py", line 854, in install
strip_file_prefix=strip_file_prefix
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/req/req_install.py", line 1069, in move_wheel_files
strip_file_prefix=strip_file_prefix,
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/wheel.py", line 316, in clobber
ensure_dir(destdir)
File "/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib64/python3.7/site-packages/pip/utils/init.py", line 83, in ensure_dir
os.makedirs(path)
File "/usr/lib64/python3.7/os.py", line 221, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/ec2-user/cloudwatchpoc/cloudtracker/venv/lib/python3.7/site-packages/urllib3'

Describe the formatting of the output better

Better describe the coloring and the use of '-', '?', and '+' in the output

Use sharedEventID

In tracking cross-account role assumptions, I should use sharedEventID as explained in https://aws.amazon.com/blogs/security/aws-cloudtrail-now-tracks-cross-account-activity-to-its-origin/

This should be fixed at:

cloudtracker/cloudtracker/datasources/es.py

Line 135 in 33852a6

 # TODO: I should also be using sharedEventID as explained in https://aws.amazon.com/blogs/security/aws-cloudtrail-now-tracks-cross-account-activity-to-its-origin/ 

Support AWS Profile?

Background
aws cli support --profile demo-account-1, other than default
however, I didn't find config.yaml supports profile, how does it work for multiple accounts? If no such option yet, can we create a feature request? I am happy to help

cloudtracker/config.yaml.demo

Lines 8 to 14 in 51e3ec3

 accounts: 

 - name: demo 

 id: 111111111111 

 iam: account-data/demo_iam.json 

 - name: demo2 

 id: 222222222222 

 iam: account-data/demo2_iam.json

cloudtracker/cloudtracker/datasources/athena.py

Lines 199 to 200 in 51e3ec3

 self.athena = boto3.client('athena') 

 self.s3 = boto3.client('s3')

Expect

have profile in config

accounts:
  - name: demo
    id: 111111111111
    profile: demo-account-1
    iam: account-data/demo_iam.json

Permissions Issue

Hello All,

Even after providing the permissions listed to the service account this will be running with, I am running into AWS permissions errors, my latest being one around glue permissions.

Is there any full list of permissions that this requires, as in my environment giving an resource wildcard is not possible.

I see that under data/aws_api_list and Cloudtrail_supported_actions there is a large list of actions, but I am assuming that is just an exhaustive list and not one of required permissions for the running user.

Thanks!

Support elasticsearch 1.x

The query language semantics are slightly different (e.g. exists in 1.x was a filter, in later versions it is a query. See http://www.dlxedu.com/askdetail/3/0620e1124992fb281da93c7efe53b97f.html and https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_query_dsl_changes.html)

Incorrect partitions built for GovCloud

When trying to run cloudtracker against a AWS GovCloud account, no results are returned, because the partitions are being built against the list of Commercial regions, which it's getting from the get_available_regions call. https://github.com/duo-labs/cloudtracker/blob/master/cloudtracker/datasources/athena.py#L274

This is because the partition_name arg is omitted (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.get_available_regions)

Obviously, this is fine for most use cases, but it would nice if we could toggle a flag in config to get GovCloud-specific env data.

Document privileges required for Athena use

These are:

arn:aws:iam::aws:policy/AmazonAthenaFullAccess

and access to the bucket with the CloudTrail logs:

PolicyDocument:
            Version : '2012-10-17'
            Statement:
            - Effect: Allow
              Action:
                - 's3:GetObject'
                - 's3:ListBucket'
              Resource: 
               - s3bucket/*
               - s3bucket

Provide better error when role is unknown

Regarding #41, CloudTracker should provide a better error message to explain that the IAM data file might be out of date.

Have command to generate the Athena tables and partitions for all configured accounts

I've used CloudTracker just to setup the Athena tables for all of the accounts at a company. I've done this by running list users with CloudTracker for each account by hand. I should have an automated way to just set this up for all accounts, which would also not need to have the IAM info collected.

Role not found but it does exist

For the life of me I can't figure out why this isn't working. Every other role (built the same way using terraform) works fine, just not this one. I'm not sure how to begin debugging this?

(PS Note I redacted some stuff in this log)

cloudtracker --account demo --role build-jenkins
INFO     Source of CloudTrail logs: s3://wd-697111119245-bucket/
INFO     Using AWS identity: arn:aws:sts::697111119245:assumed-role/SSOAdminRole/botocore-session-1540918197
INFO     Using output bucket: s3://aws-athena-query-results-697111119245-us-west-2
INFO     Account cloudtrail log path: s3://wd-697111119245-bucket//AWSLogs/697111119245/CloudTrail
INFO     Checking if all partitions for the past 12 months exist
Traceback (most recent call last):
  File "/Users/paul.oflynn/workspace/venv/bin/cloudtracker", line 11, in <module>
    load_entry_point('cloudtracker==2.1.2', 'console_scripts', 'cloudtracker')()
  File "/Users/paul.oflynn/workspace/venv/lib/python3.7/site-packages/cloudtracker/cli.py", line 104, in main
    run(args, config, args.start, args.end)
  File "/Users/paul.oflynn/workspace/venv/lib/python3.7/site-packages/cloudtracker/__init__.py", line 481, in run
    role_iam = get_role_iam(rolename, account_iam)
  File "/Users/paul.oflynn/workspace/venv/lib/python3.7/site-packages/cloudtracker/__init__.py", line 219, in get_role_iam
    raise Exception("Unknown role named {}".format(rolename))
Exception: Unknown role named build-jenkins
(venv) cloud-tracker $ aws iam get-role --role-name build-jenkins
{
    "Role": {
        "Path": "/jenkins/",
        "RoleName": "build-jenkins",
        "RoleId": "AROAI2***********",
        "Arn": "arn:aws:iam::697111119245:role/jenkins/build-jenkins",
        "CreateDate": "2018-10-29T17:26:16Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::697111119245:role/nodes.*******.com",
                        "Service": "ec2.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "MaxSessionDuration": 3600
    }
}
(venv) cloud-tracker $

Avoid overlap with repokid-extras

Netflix open-sourced some of their internal work that shares strong similarities with CloudTracker:
https://github.com/Netflix-Skunkworks/repokid-extras

Keeping this here as a note so that project can be reviewed to avoid duplicate efforts.

CloudTracker Output

Having an issue with the CloudTracker output. According to the documentation CloudTracker shows a diff of the privileges granted vs used. The symbols mean the following:
"No symbol" means this privilege is used, so leave it as is.
- A minus sign means the privilege was granted, but not used, so you should remove it.
? A question mark means the privilige was granted, but it is unknown if it was used because it is not recorded in CloudTrail.
+ A plus sign means the privilege was not granted, but was used. The only way this is possible is if the privilege was previously granted, used, and then removed, so you may want to add that privilege back.

I just needed to understand something about the output.
For eg, the output to check the privileges for "X" role, lets say I got "+ iam:createrole" which means that privilege was previously granted and used but later removed, according to the documentation. But the "X" role has the permission to create roles, so the output should have been "no symbol" instead of "+" to iam:createrole. Am i right? Can anyone clarify this?

Create multiple partitions in a single call

From ajkerrigan on og-aws, he mentioned that that you can use "add if not exists and then add all partitions in a single query. I’m not sure where it tops out, but I’ve been able to conditionally add 365 partitions in a single call. ... That single call ended up taking ~15 seconds ... Combine it with dateutil’s rrule and if not exists to avoid double-adds, and it’s actually pretty sexy."

Partitions are added here:

cloudtracker/cloudtracker/datasources/athena.py

Line 300 in fd45948

 queries_to_make.add('ALTER TABLE {table_name} ADD '.format(table_name=self.table_name) + query) 

Support Athena

Getting setup with ElasticSearch and loading all logs into it, especially if using hindsight which has a complicated install process, is a big hurdle for anyone wanting to use this project. Additionally, it can create large delays, as loading logs into ES happens at about 100K/minute, in addition to downloading from S3 and normalizing via jq, so a year of logs for an account is likely to take > 8 hours.

Athena would be much faster and easier.

Support custom timestamp fields

You should be able to specify the field containing the events in your index. In my case, the field eventTime, and in fact, even the_cloudtrail_data.eventTime, are not fields that exist in my index))

Mention similar tools

Mention https://github.com/flosell/trailscraper

Show source of privilege

Let's say this tool has told you that a user has some unused privilege. The next thing you'll want to know is why the user has that privilege in the first place, especially if there is potentially a condition or just if this user is a member of many groups and this is due to an attached policy within that group, or if the privilege was granted with wildcards which will make it tougher to grep for. So we should know the source of these privileges.

	accounts:
	- name: demo
	id: 111111111111
	iam: account-data/demo_iam.json
	- name: demo2
	id: 222222222222
	iam: account-data/demo2_iam.json

	self.athena = boto3.client('athena')
	self.s3 = boto3.client('s3')