Git Product home page Git Product logo

email2hash's People

Contributors

azadi avatar mrphs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

azadi

email2hash's Issues

Preserve the order of hashes and index them in the output

We are currently sorting the list of hashes and then outputting the list to a file. We outline the reasons for doing so in the spec:

    The script computes the HMAC (see 4 below) for each email address in the
    CSV file and saves it to a list. The list is then sorted to make it easier
    to check for overlap and to hide the information about member activity
    before being written to disk.

Instead of doing this, we should preserve the order in which the email addresses were hashed (as we were doing earlier) and append the line number in front of them, like this:

1,<hash-1>
2,<hash-2>

The benefits of this:

  1. The comparisons will be easier as the hashed list can be compared with the raw list by comparing the index. So for example if the raw list was:

and the hashed list is:

1,<hash-1>
2,<hash-2>

It's easy to compare the overlap by checking the index and matching it to find the corresponding email address.

  1. We no longer need to deal with (the memory overhead) of a list that will have millions of email addresses; instead we write the output as we process directly to the file.

Hashing Algorithm

We shouldn't be supporting any hashing algorithm older than SHA3. Perhaps we could support BLAKE2 as well but not required currently. Let's make SHA3 256 default for now.

Randomly generate a secret

We currently ask the user to input a secret key. We should automatically do that for them if they don't want to choose their own.

Dependabot couldn't find a requirements.txt for this project

Dependabot couldn't find a requirements.txt for this project.

Dependabot requires a requirements.txt to evaluate your project's current Python dependencies. It had expected to find one at the path: /requirements.txt.

If this isn't a Python project, or if it is a library, you may wish to disable updates for it from within Dependabot.

You can mention @dependabot in the comments below to contact the Dependabot team.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.