Git Product home page Git Product logo

s3-janitor's Introduction

s3-janitor

He takes the buckets and empties out the dirty bleach water.

Given a bucket prefix or explicit buckets, this searches those buckets for objects whose last modified date exceeds the cutoff date.

Usage

usage: janitor.py [-h] [--days DAYS] --access-key ACCESS_KEY --secret-key
                  SECRET_KEY --endpoint ENDPOINT
                  [--target-method {prefix,buckets}] [--prefix PREFIX]
                  [-b [BUCKETS [BUCKETS ...]]]
                  [--delete-method {list_then_delete,delete_every_page}] [-d]
                  [--dry-run]

optional arguments:
  -h, --help            show this help message and exit
  --days DAYS           days to keep
  --access-key ACCESS_KEY
                        s3 access key
  --secret-key SECRET_KEY
                        s3 secret key
  --endpoint ENDPOINT   endpoint url
  --target-method {prefix,buckets}
                        method of bucket selection
  --prefix PREFIX       bucket prefix to target
  -b [BUCKETS [BUCKETS ...]], --buckets [BUCKETS [BUCKETS ...]]
                        explicit buckets to target
  --delete-method {list_then_delete,delete_every_page}
                        deletion method
  -d, --debug           debug logging
  --dry-run             dry run mode, do not delete

It's possible to delete everything from a bucket by specifying the cutoff date as --days "-1". This should reverse the direction of the timedelta, instead adding days to the cutoff date.

Depending on the method you choose for --target-method you must use the appropriate option, prefix requires --prefix, buckets requires --buckets.

The deletion method can be tweaked. Boto3 paginates objects in chunks of 1000 objects during a listing, the default method list_then_delete is to obtain a full bucket listing, then delete all the objects in 1000 object chunks (the deletion function client.delete_objects only accepts 1000 objects at once).

The secondary option delete_every_page will queue up a page's worth of chunks and delete them before progressing to the next page. Worth considering if the bucket in question has an obscene amount of objects.

Example Usage

$ python janitor.py --access-key <some access key> --secret-key <some secret key> --endpoint "https://s3.example.com" --target-method buckets -b my-bucket --dry-run    
[2018-05-24 15:07:27] s3-janitor [INFO    ]: DRY RUN MODE: No deletions will be made
[2018-05-24 15:07:27] s3-janitor [INFO    ]: Searching for objects before cutoff date of 2018-02-23
[2018-05-24 15:07:28] s3-janitor.my-bucket_worker [INFO    ]: Found 931 objects in chunk 1 for my-bucket
[2018-05-24 15:07:29] s3-janitor.my-bucket_worker [INFO    ]: Found 499 objects in chunk 2 for my-bucket
[2018-05-24 15:07:30] s3-janitor.my-bucket_worker [INFO    ]: Found 365 objects in chunk 3 for my-bucket
[2018-05-24 15:07:31] s3-janitor.my-bucket_worker [INFO    ]: Found 0 objects in chunk 4 for my-bucket
[2018-05-24 15:07:31] s3-janitor.my-bucket_worker [INFO    ]: Found 0 objects in chunk 5 for my-bucket
... lots of chunks ...
[2018-05-24 15:08:20] s3-janitor.my-bucket_worker [INFO    ]: Found 0 objects in chunk 94 for my-bucket
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: Found 0 objects in chunk 95 for my-bucket
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: Total objects queued for deletion 1795
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: 2 chunks formed from 1795 objects
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: Deleting chunk 1/2 (1000 objects) for my-bucket
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: Deleting chunk 2/2 (795 objects) for my-bucket
[2018-05-24 15:08:28] s3-janitor.my-bucket_worker [INFO    ]: Finished deletions

If you're thinking this is slower than in another language like Go (just listing objects in Go was 2-3x quicker for example), you're correct, if you have a similar project written in Go, ping me so I can point to it from here.

s3-janitor's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.