Git Product home page Git Product logo

Comments (6)

cyb3rz3us avatar cyb3rz3us commented on July 19, 2024

sensitive payloads can be identified by their size alone

Help me understand how this works for many of the targets that would be encrypted by 'age'?

For example, let's say I have a TAR-ball that is comprised of say 5000 pictures and when encrypted, it comes in around 25GB. And then I then have another TAR-ball of documents, if might come in when encrypted at around 12GB. And finally, a 3rd TAR-ball of videos, pics, and documents at 42GB when encrypted. How is the size of the payload helping to identify the payload?

I'm sure there's something I'm not understanding so please receive my question as only seeking to understand...not push-back...

from age.

colmmacc avatar colmmacc commented on July 19, 2024

Well let's say that the tarball is 'illegal' material that you have an interest in denying possession of. There are awful and cruel examples of this, but there are also righteous and good examples of this. For example the encrypted file might be the 'Bible' for a religion that an oppressive state has made illegal to possess, or it might be a tarball of material that a (corporate or state) whistle-blower leaked to a news-organization.

Without any padding, an exact byte-match of the illicit material is extremely suggestive circumstantial evidence of possession. With well-crafted padding; there is a greater range of potential content that encrypts to that size and the match is not as definitive.

from age.

colmmacc avatar colmmacc commented on July 19, 2024

I should add too that padding can be a useful mitigation against another kind of attack: shared compression tables.

Suppose that the tarball is a gzipped backup of your email inbox and that you make that backup every day and send it to a Cloud service for storage. The size of the backup can be seen by anyone who can observe the upload.

Now suppose I send you an email every day; and also observe the size of the backup. Over time, by trying different strings in my email, I can statistically profile where there Is overlap between the strings in my email, and other strings in your inbox (other emails); or at least the frequency of them. This is because the compression table entries will be shared, and so when there's overlap the output size is smaller than I would otherwise expect it to be. Exposing exact payload sizes make these kinds of shared-compression-table makes the attacks very practical.

For example: suppose I send you a 200 byte email that contains the string "TOP SECRET DOCUMENTS", over the weekend when you're not getting much other email, and the size of your backup only goes up by 12 bytes, I can guess that the string "TOP SECRET DOCUMENTS" appears elsewhere in your inbox.

Padding doesn't prevent these attacks; the attacker can pad themselves until the email crosses a padding boundary, helps makes these attacks much more costly and slow and only one bit of information is leaked each time a padding boundary is crossed, so it's just not nearly as practical.

from age.

cyb3rz3us avatar cyb3rz3us commented on July 19, 2024

Taking the second set of comments first, as you mentioned, padding does nothing to prevent that type of attack. Also, what you outline is an attack far beyond the scope of 'age' or really any tool used for only file encryption.

Now, the first set of comments...I'm a bit dubious on entities being able to reliably determine someone has an encrypted form of a given work from only the byte count. And even if they might be able to do this, thinking just about only obscurity for a moment, then I think we first need to ask is if providing that type of obscurity is within the scope of the 'age' itself. From my reading of FiloSottie's blog and the front page of this project, it seems to me that the goal is to provide a very easy-to-use and lightweight file encrypt\decrypt tool. Perhaps I'm wrong here but that's my interpretation.

from age.

colmmacc avatar colmmacc commented on July 19, 2024

In context that's absolutely not what I wrote, but I should have been more careful not to say 'prevent' so casually. For compression attacks, sufficient padding does increase the costs for attackers, well beyond practicality in most cases. One bit per padding length is very very low bandwidth with which to try and work out the compression table collision.

Sounds like in your threat model that you don't care about an attacker being able to identify the plaintext from a set of known plaintexts. That's ok, but it's a pretty non-standard assumption.

from age.

cyb3rz3us avatar cyb3rz3us commented on July 19, 2024

I don't see this as particularly "my threat model"...I see it as the likely threat models that are relevant to the typical user of 'age'. Said differently, I don't see the typical 'age' user as one who is encrypting a lot of widely known and\or disseminated plain-texts.

The issue you describe is more of a privacy concern as opposed to an encryption concern and again, based on my interpretation of FiloSottie's writings re: PGP and the reason why 'age' was developed in the first place, I view 'age' as an encryption tool...nothing more. That's not to say it can't be more but then that is really the dev's decision...

from age.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.