masonicboom / ipscrub Goto Github PK
View Code? Open in Web Editor NEWIP address anonymizer module for nginx
IP address anonymizer module for nginx
Hi,
is it somehow possible to reprocess (reparse) existing nginx logs to hash IPs?
Hello,
I just installed ipscrub, and I'm seeing unexpected behaviour.
The screenshot shows the unscrubbed access.log
on top and the scrubbed access_scrubbed.log
below.
The hash of the IP 146.223.14.99
changes from ndjXjp
to JEb+/G
and back again within the same second. Looking at my access.log, this happens multiple times. Most requests from that IP have ndjXjp
, but a few have JEb+/G
as hashes.
We can also see that 66.69.106.150
has the hashes c5W8yh
and Tz0Hqw
.
Is this somehow expected behaviour?
I expected the same IP always transforms into the exact same hash within a PERIOD
.
I'm on nginx/1.22.1 and compiled ipscrub as a dynamic module.
Kind regards
ipscrub uses ngx_random to generate random nonces. ngx_random is defined as the C random() function on non-Windows platforms, and rand() on Windows. NOTE: this is not a cryptographically secure RNG, but for the following threat model, that is ok.
No, it is not OK. The only components going into the nonce for each period are the the timestamp (in seconds - trivial to guess from the log timestamps) and random()
, which is a 31-bit number and can be trivially bruteforced. Brute-forcing 32 bits is utterly trivial, which means you can unmask ~every IPv4 uniquely. And since the hash is stored as 6 base64 digits, that's 36 bits, which means most of the outputs will not map to valid IPv4 addresses, which means you can build a trivial checker oracle to find out if your guessed salt is correct. And now you have all the bits and pieces you need to unmask every IPv4 address, in perpetuity, with completely trivial CPU resources and ~16GB of RAM for the resulting lookup table.
On top of this, the above is assuming the README is correct, but it isn't. As far as I can tell the only data going into the nonce is a single call to ngx_random()
per period, not even the timestamp. So your randomness is 31 bits, into 36 bit hashes, which means all you need are ~7 unique IPs logged per period to completely recover the nonce for that period with basic bruteforcing, and that's even assuming ngx_random()
were actually secure, which it isn't.
Seriously, this is broken. Please use at least 128 bits of cryptographically secure randomness for your periodic nonces, preferably from /dev/urandom
or equivalent. Make sure whatever you use is forward-secure (cannot be rolled back to previous nonces).
this is a feature request:
The format of the error log cannot easily be configured. Is it possible to obfuscate ip addresses in the error log within the scope of a / this module?
thanks for this great code & setup!
(been scouting for something like this (a bit dumbfoundedly for why so long!) for years...)
better than my C patching/hacking -- we love modules.
anyhoo... i've tweaked it to use a daily source (REST GET API) of a salt value so that 1000s of servers will all get the same anonymized IP address (which changes daily).
this is our current policy and preference at archive.org (i'm the main/longest tech there).
question/issue is - would you like to see/review the gist to see if interested in mainstreaming it?
entryway and suggestion would be optional ipscrub_salt_rest_source
(or something like that).
I've got it working with a simple http TCP/http fetch and am hoping to go live soon.
So it's kind of a question of "is it interesting to see/consider what we're up to - or is this just noise?" :)
thanks!
Did anyone have success compiling this on alpine?
In file included from /usr/include/bsd/sys/cdefs.h:51,
from /usr/include/bsd/libutil.h:45,
from /usr/include/bsd/stdlib.h:39,
from ../ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:
/usr/include/sys/cdefs.h:1:2: error: #warning usage of non-standard #include <sys/cdefs.h> is deprecated [-Werror=cpp]
1 | #warning usage of non-standard #include <sys/cdefs.h> is deprecated
| ^~~~~~~
In file included from /usr/include/bsd/sys/cdefs.h:51,
from /usr/include/bsd/stdlib.h:48,
from ../ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:
/usr/include/sys/cdefs.h:1:2: error: #warning usage of non-standard #include <sys/cdefs.h> is deprecated [-Werror=cpp]
1 | #warning usage of non-standard #include <sys/cdefs.h> is deprecated
| ^~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [objs/Makefile:1329: objs/addon/src/ngx_ipscrub_support.o] Error 1
Trying to build under Debian 10.2:
/usr/local/src/nginx/modules/ipscrub-1.0.1/ipscrub/src/ngx_ipscrub_support.c:9:24: fatal error: bsd/stdlib.h: No such file or directory
#include <bsd/stdlib.h>
^
compilation terminated.
objs/Makefile:1655: recipe for target 'objs/addon/src/ngx_ipscrub_support.o' failed
make[1]: *** [objs/addon/src/ngx_ipscrub_support.o] Error 1
make[1]: Leaving directory '/usr/local/src/nginx/nginx-1.17.8'
Makefile:11: recipe for target 'install' failed
make: *** [install] Error 2
This looks great. I tried it out and it works just as advertised.
Can you please make a release, so I can package it with a more useful version than some development hash?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.