Git Product home page Git Product logo

download-file-from-tar-in-s3's Introduction

download-file-from-tar-in-s3

Introduction / Motivation

One of the most painful points of backing up your EBS/EFS files to S3 is that you will lose the POSIX permissions. In some cases this totally acceptable as you can rebuild the permissions using chmod and/or chown. However, there are scenarios in which it would be unrealistic given the inability to rebuild the files permissions due to a variety of reasons that you should already know if you got to this page.

In the case of EFS, things get a bit trickier as you often find yourself in one of the two scenarios described below:

  1. You may use EFS-to-EFS backup solution similar to this one developed by AWS Solutions Builder team. It's a fairly good solution as it keeps the POSIX permissions across multiple EFS, perform incremental backups and handle backup retention policy. However, it becomes a bit difficult to cope with depending on your VPC/Network configurations (Subnet w/wo auto assign public IP, Proxy configurations, DHCP OptionSet, specific security groups, frequency <-> throughput). It is worth mentioning that the price per GB data stored on EFS is the biggest on AWS which automatically implies that cost-sensitive architectures will rule out this approach.

  2. You may write a EFS-to-S3 kind of solution using a cron job to aws s3 sync the files to S3. However, if you tackle this problem that way, you will lose the file permissions, symbolic links, the amount of time to sync them to S3 can be huge depending on the size of the files vs instance type and most importantly, your S3 API costs will skyrocket as you are charged as you go. On the other hand, you can granularly manage S3 data lifecycle (Data-tiering), the price per GB is likely the lowest in the world.

As you can see, it's one of those situations in which you need to choose the lesser evil solution. Only a few people realise it but although the second answer seems to be a good trade-off, managing data lifecycle from S3 to Glacier using small files may not be the wisest idea you ever had (for obvious reasons) and depending on the size of your EFS file system you may be throttled and this goes into a whole new level of complexity.

As a result, some architects/developers opt for compressing multiple files into a tar, which keeps the POSIX permissions and reduce the overall number of requests to S3 API and allow the Glacier archive usage cost-effective (sounds good, hey?). However, it would imply that developers would lose the ability to retrieve a single file (until now :]). This experimental utility tool generates a metadata file which can later be used to download individual files to enable you to have the best of both worlds.

It's essential to understand that this is at an experimental stage and not all features of the tar file format are implemented at this time. If you want to see a feature developed/released sooner, please feel free submit your PR ;)

Example FOSSA Status

Build

mvn clean compile package

Usage

usage: java -jar download-file-from-tar-in-s3.jar <options>
 -help              print this message
 -operation <arg>   Operation - It needs to be either upload or download


usage: java -jar download-file-from-tar-in-s3.jar -operation upload
            <options>
 -aws_bucket_name <arg>   S3 bucket name
 -aws_key_name <arg>      S3 key name
 -file <arg>              local file


usage: java -jar download-file-from-tar-in-s3.jar -operation download
            <options>
 -aws_bucket_name <arg>   S3 bucket name
 -aws_key_name <arg>      S3 key name
 -output <arg>            destination file
 -tar_file_entry <arg>    entry in the tar file

License

FOSSA Status

download-file-from-tar-in-s3's People

Contributors

dependabot[bot] avatar fossabot avatar paulomigalmeida avatar

Watchers

 avatar  avatar  avatar

Forkers

fossabot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.