Git Product home page Git Product logo

ipscrub's Introduction

ipscrub

ipscrub is an IP address anonymizer for nginx log files. It's an nginx module that generates an IP-based hash. You can use this hash to link requests from the same source, without identifying your users by IP address.

Screenshot of nginx logs when using ipscrub

TOC

Security Model

  1. On initialization, and again every PERIOD, generate salt using 128bits from arc4random_buf().
  2. On each request, generate masked IP address as HASH(salt ++ IP address).
  3. Log masked IP address.

ipscrub uses arc4random to generate random nonces (see Theo de Raat's talk on arc4random for a great overview). On Linux this requires installing libbsd (package libbsd-dev on Ubuntu/Debian).

ALSO NOTE: the generated hash WILL change on each PERIOD transition, so you will only have continuity within each PERIOD. But because users can transition between networks at any time (e.g. wifi -> cellular), you'd have this type of issue even if you were storing raw IPs.

Threat Model

  1. Government presents you with an IP address and demands identification of user corresponding to that address.
  2. Government identifies a user e.g. by email address, and demands IP address they had at some point in time.

In threat scenario (1), the goal is to compute the masked IP corresponding to a target IP address. This will only be possible if the demand is made before the end of the current PERIOD.

Scenario (2) is defended against because the server operator does not know the salt, and cannot infer it based on the request timestamp, because the salt is generated from a nonce that is only stored in memory. The server operator would have to be an accomplice in this case, but that is more simply accomplished by the server operator just recording the unmasked IP. So this security/threat model does not defend against a malicious server operator, but that is not the point. It does defend against an honest server operator being compelled in threat scenarios (1) and (2).

Usage

Installation

Building From Source

ipscrub can be built statically with nginx or as a dynamic module. See the Makefile for examples of both ways.

Packages

Configuration

In your nginx.conf,

  1. At the top-level, load the module by adding the line load_module ngx_ipscrub_module.so; (NOTE: only if you built as a dynamic module).
  2. Set ipscrub_period_seconds <NUM SECONDS PER PERIOD>; (optional).
  3. In your log_format directives, replace $remote_addr with $remote_addr_ipscrub.
  4. Reload your nginx config.

NOTE: nginx may still leak IP addresses in the error log. If this is a concern, disable error logging or wipe the log regularly.

Running Tests

make test

Checking for Updates

make check-up-to-date

This will have a non-zero exit code if you aren't up-to-date, so you can automate regular checks.

Changelog

  • 1.0.1 fixed vulnerability to unmasking hashed IPs (thanks to @marcan)
  • 1.0.0 initial release

GDPR

GDPR goes into effect on May 25, 2018. It legislates the handling of personal data about your users, including IP addresses.

From https://www.eugdpr.org/gdpr-faqs.html:

What constitutes personal data?

Any information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person. It can be anything from a name, a photo, [...], or a computer IP address.

The hashes generated by ipscrub let you correlate nginx log entries by IP address, without actually storing IP addresses, reducing your GDPR surface area.

YAGNI

Why are you logging IP addresses anyway? You Ain't Gonna Need It. If you want geolocation, just use MaxMind's GeoIP module in conjunction with ipscrub.

License

Copyright 2018 Mason Simon

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. If you use this module in a production service that has an associated privacy policy, that privacy policy must include this text "This service uses ipscrub (http://www.ipscrub.org)." or similar text in the same spirit, which includes that link to http://www.ipscrub.org.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Alternative Licensing

If you would like to use ipscrub without attribution in your privacy policy, or to discuss custom development, get in touch and we can work something out (email address is in my GitHub profile, @masonicb00m on Twitter).

ipscrub's People

Contributors

fpruitt avatar masonicboom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ipscrub's Issues

Hash changes within `PERIOD`

Hello,
I just installed ipscrub, and I'm seeing unexpected behaviour.

The screenshot shows the unscrubbed access.log on top and the scrubbed access_scrubbed.log below.

The hash of the IP 146.223.14.99 changes from ndjXjp to JEb+/G and back again within the same second. Looking at my access.log, this happens multiple times. Most requests from that IP have ndjXjp, but a few have JEb+/G as hashes.

We can also see that 66.69.106.150 has the hashes c5W8yh and Tz0Hqw.
scrot_2022-12-15_19-38-54_screenshot

Is this somehow expected behaviour?

I expected the same IP always transforms into the exact same hash within a PERIOD.

I'm on nginx/1.22.1 and compiled ipscrub as a dynamic module.

Kind regards

Issues compiling on alpine

Did anyone have success compiling this on alpine?

In file included from /usr/include/bsd/sys/cdefs.h:51,
                 from /usr/include/bsd/libutil.h:45,
                 from /usr/include/bsd/stdlib.h:39,
                 from ../ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:
/usr/include/sys/cdefs.h:1:2: error: #warning usage of non-standard #include <sys/cdefs.h> is deprecated [-Werror=cpp]
    1 | #warning usage of non-standard #include <sys/cdefs.h> is deprecated
      |  ^~~~~~~
In file included from /usr/include/bsd/sys/cdefs.h:51,
                 from /usr/include/bsd/stdlib.h:48,
                 from ../ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:
/usr/include/sys/cdefs.h:1:2: error: #warning usage of non-standard #include <sys/cdefs.h> is deprecated [-Werror=cpp]
    1 | #warning usage of non-standard #include <sys/cdefs.h> is deprecated
      |  ^~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [objs/Makefile:1329: objs/addon/src/ngx_ipscrub_support.o] Error 1

obfuscate error log, too

this is a feature request:
The format of the error log cannot easily be configured. Is it possible to obfuscate ip addresses in the error log within the scope of a / this module?

make a release

This looks great. I tried it out and it works just as advertised.
Can you please make a release, so I can package it with a more useful version than some development hash?

Your randomness is bad and you should feel bad.

ipscrub uses ngx_random to generate random nonces. ngx_random is defined as the C random() function on non-Windows platforms, and rand() on Windows. NOTE: this is not a cryptographically secure RNG, but for the following threat model, that is ok.

No, it is not OK. The only components going into the nonce for each period are the the timestamp (in seconds - trivial to guess from the log timestamps) and random(), which is a 31-bit number and can be trivially bruteforced. Brute-forcing 32 bits is utterly trivial, which means you can unmask ~every IPv4 uniquely. And since the hash is stored as 6 base64 digits, that's 36 bits, which means most of the outputs will not map to valid IPv4 addresses, which means you can build a trivial checker oracle to find out if your guessed salt is correct. And now you have all the bits and pieces you need to unmask every IPv4 address, in perpetuity, with completely trivial CPU resources and ~16GB of RAM for the resulting lookup table.

On top of this, the above is assuming the README is correct, but it isn't. As far as I can tell the only data going into the nonce is a single call to ngx_random() per period, not even the timestamp. So your randomness is 31 bits, into 36 bit hashes, which means all you need are ~7 unique IPs logged per period to completely recover the nonce for that period with basic bruteforcing, and that's even assuming ngx_random() were actually secure, which it isn't.

Seriously, this is broken. Please use at least 128 bits of cryptographically secure randomness for your periodic nonces, preferably from /dev/urandom or equivalent. Make sure whatever you use is forward-secure (cannot be rolled back to previous nonces).

Cannot build

Trying to build under Debian 10.2:

/usr/local/src/nginx/modules/ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:24: fatal error: bsd/stdlib.h: No such file or directory
 #include <bsd/stdlib.h>
                        ^
compilation terminated.
objs/Makefile:1655: recipe for target 'objs/addon/src/ngx_ipscrub_support.o' failed
make[1]: *** [objs/addon/src/ngx_ipscrub_support.o] Error 1
make[1]: Leaving directory '/usr/local/src/nginx/nginx-1.17.8'
Makefile:11: recipe for target 'install' failed
make: *** [install] Error 2

single same daily salt

thanks for this great code & setup!
(been scouting for something like this (a bit dumbfoundedly for why so long!) for years...)

better than my C patching/hacking -- we love modules.

anyhoo... i've tweaked it to use a daily source (REST GET API) of a salt value so that 1000s of servers will all get the same anonymized IP address (which changes daily).

this is our current policy and preference at archive.org (i'm the main/longest tech there).

question/issue is - would you like to see/review the gist to see if interested in mainstreaming it?

entryway and suggestion would be optional ipscrub_salt_rest_source (or something like that).
I've got it working with a simple http TCP/http fetch and am hoping to go live soon.

So it's kind of a question of "is it interesting to see/consider what we're up to - or is this just noise?" :)

thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.