Comments (23)
Thanks for all the positive feedback. The chacha8 seems like a good idea. I'll implement it in a dev branch and see how it performs.
from srandom.
Solid work Steve!
from srandom.
I wouldn't mind double checking this work. How about a Makefile and instructions to run the "attack"?
Bug 1 & 3 is pretty simple fix. Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?
Clone the repo, and click attack/test.html
. If your system installed your own work, It will be detected. Otherwise, a true CSPRNG will not trigger the alarm.
Distinguishable from a CSPRNG means your randomness is not qualified for cryptographical solutions. This is already called a break.
If this is your required instruction and you can reproduce it, please update your README so that downstream developers (especially some custom android kernel) will receive alerts.
from srandom.
It appears that attack/test.html
only works in Chrome/Chromium. From my testing, it fails in Firefox and Brave, which means they're probably using the getrandom()
system call to get randomness for window.crypto.getRandomValues()
rather than reading /dev/urandom
directly.
from srandom.
Unfortunately, the overall effect of @Sc00bz' work isn't merely "here are some bugs you should fix". Rather, it calls into question whether the entire project is safe - or even necessary.
The bar of whether something should be used as a replacement for /dev/urandom is much higher than a couple of bugfixes.
from srandom.
the current Linux kernel RNG is based on ChaCha20
I'm pretty sure they use normal ChaCha20. Which can only use 128 bit SIMD. AVX2 and AVX-512 are 256 bit and 512 bit SIMD respectively. My suggestion would be 2-4x faster on top of the 2.5x. Since most/all current x86 CPUs have at least AVX2. Also the speed up from not shuffling the state in registers because it's registers of 4, 8, or 16 ChaCha states. So it's at least 5x faster with AVX2 and 10x with AVX-512.
Note that's for actual AVX2 and AVX-512. AMD has put out CPUs with those instruction sets but are working on half the SIMD width at a time. It's faster than half width SIMD by a little but not 2x. AMD Zen 2 moved from this to full width AVX2 and AMD Zen 4 added half width AVX-512.
from srandom.
I basically rewrote the whole thing. ChaCha8 uses scalar, SSE2, SSSE3, AVX, XOP, AVX2, AVX-512, and NEON in 160 lines. It runs with 16 states of ChaCha and outputs 1 KiB per call. I need to do some testing. I want to make sure XOP and AVX-512 work but I'd need to find something that runs those. Currently XOP is commented out because the GCC documentation is lacking. Also I don't think AMD has made a CPU with XOP in 7 years and Intel never implemented it. Oh right I wanted to do a condition variable but still haven't looked up how that works in a Linux kernel module. I'll probably do a pull request tomorrow.
I have three methods for reseeding:
Each set of states reseed after 4 TiB with get_random_bytes().
Each set of states reseed after 4 TiB - 1 KiB with the next 1 KiB.
Each set of states repeat output after 16 ZiB (2**74 bytes).
from srandom.
I wouldn't mind double checking this work. How about a Makefile and instructions to run the "attack"?
Bug 1 & 3 is pretty simple fix. Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?
from srandom.
BTW: I do highly appreciate the hard work you put into this.
from srandom.
@Sc00bz Are you willing to peer review v2.0?
from srandom.
LOL, if everyone abandoned and deleted every project that had an exploit, there would be nothing on the web. If you feel it's useless, unstar this repo and move on... Traffic insights of this project says otherwise, so other people since that past 8 years have found some use from it.
I'll assume you know that going from 1.x to 2.x is NOT merely bug fixes... If Sc00bz is unwilling to test/review v2.0, then I'll still try to improve this project without his input.
from srandom.
Spinlocks are pretty normal stuff in kernel code. It's the best way to to protect code that should be run by only 1 thread. I was not able to reproduce the problem of a 'block forever'. Do you have an alternative solution?
Did you run dd if=/dev/srandom of=/dev/null bs=1M
in 17+ times or write a script to do the same? Also this only will work in standard mode because UHS has 32 arrays and only 16 can be locked currently.
You don't need the spin lock. Just lock around these lines because update_sarray*()
locks:
Lines 359 to 364 in e4cb549
Or on line 339 unlock and then lock again.
Lines 336 to 341 in e4cb549
Also I didn't mention this in the blog because it's a small thing but you need to lock on the work thread. It's possible for seed_PRND_*()
to get undone by another thread.
I'm not going to review v2 unless there's crypto. I'd suggest using ChaCha8 (that's 8 vs 20 rounds. It's weaker but should be fine) and do 4x, 8x, or 16x in parallel using SIMD (SSE2, AVX, AVX2, and AVX-512 for x86 and NEON for ARM). Don't bother with endianness or [de]interleaving the instances because it doesn't matter for a PRNG. For seeding/reseeding, I'd suggest doing what /dev/urandom does unless you can grab from that or getrandom()
. I did something like this in 2008 with SHA-256 and SSE2.
Optionally you could use VAES instead when it's available, but you'll need benchmarks to see if that's faster. You could use a spin lock so multiple threads can work at the same time but on different states. Since the current spin lock does very little for performance. Also you don't need to randomly select. Just rotate through them all. Also you should not throwaway generated random data. Mark how much in the buffer is used and refill it when there's nothing left. Since a lot of calls are going to be for nonce/key sized data 12-32 bytes.
from srandom.
I'd suggest using ChaCha8 (that's 8 vs 20 rounds. It's weaker but should be fine)
This is a valid suggestion provided by recent cryptanalysis. According to Jean-Philippe Aumasson, ChaCha8 is still cryptographically secure, while ChaCha20 is over-cautious:
5.3 How many rounds?
Our goal is to propose numbers of rounds for which we have strong confidence that the algorithm will never be wounded, let alone broken, using the terminology defined in §4.4. Based on the previous research and our cryptanalysis experience, in particular as related to these algorithm (sic), we propose the following:
- AES: ...
- BLAKE2: ...
- ChaCha: 8 rounds instead of 20 (that is, ChaCha8), yielding a 2.5× speed-up.
- SHA-3: ...
These suggested numbers of rounds are probably imperfect choices in terms of consistency, yet the security margins of these four families of algorithms are more consistent with these new rounds numbers than with their original ones. Note that AES’ initial security margin was the thinnest of all, not because it’s the older design, but because of the initial not-overconservative choice of rounds vs. best known attack. AES is thus the primitive for which we propose the smallest change.
Because the current Linux kernel RNG is based on ChaCha20 and provides 400+ MBps throughput with 64k chunk sizes (according to my benchmark in #13), and knowing that ChaCha8 will provide a 2.5× speed up, we can expect close to 1+ GBps throughput. While this doesn't reach the throughput of xorshift64, it's cryptographically secure and still provides incredible performance.
from srandom.
LOL, if everyone abandoned and deleted every project that had an exploit, there would be nothing on the web. If you feel it's useless, unstar this repo and move on
No one is suggesting you abandon the project (that I've seen so far). Instead we're advising you to either move away from xorshift64 to a cryptographically secure primitive (EG, ChaCha8), or remove the claims in the README.md that it's secure as well as the advice to replace /dev/urandom
.
Traffic insights of this project says otherwise, so other people since that past 8 years have found some use from it.
I find value in this project as a fast RNG in kernel space (even though it might be a better fit in user space). I initially stumbled on this project researching the historic OpenBSD /dev/{a,p,s}random
devices. I was intrigued when I came here, but was dismayed when I read the README.md followed by reading the source code.
Another RNG Linux kernel module that I stumbled on is https://github.com/Error916/LFSR_module which uses a 128-bit linear feedback shift register, but the performance is terrible. It's also (obviously) insecure, but being as slow as it is reduces my interest in the project.
Anyway, your project is great, we just want the security claims to be accurate.
from srandom.
So the potential is there to outperform the existing xorshift64 implementation? Win-win!
from srandom.
Personally, I don't think ChaCha8 will outperform xorshift... We'll need to wait until it's implemented to see how it all performs.
BTW: I know there's probably not much interest (in this thread in particular), but I just pushed a totally revamped v2beta using updated PRNGS and implementing fixes and recommendations.
I replace SplitMix with wyhash and I replaced an old xorshift128 with xoshiro256**.
I added many new things like random buffer selection (instead of sticking to one for the entire request), random shuffle, random selection of PSRNG and a totally separate PRNG that is "internal to the module only" to make all these random events unexposed. Fixed the locking, lots of cleanup and simplification.
The README is not complete and ChaCha8 is not implemented yet. I decided to use ChaCha8 in standard mode and UHS will use this updated XorShift if it's good enough.
It's about 50% slower then srandom 1.x.
from srandom.
"ChaCha8" in the second to last sentence caught my eye. So I missed the multiple times where you said you didn't implement it and went looking for it. Anyway I accidentally broke version 2's UHS mode. I'm also not sure how you were able to benchmark it because it should just segfault. Since you are writing to unallocated memory and overwriting a pointer with random. Also this won't compile on 32 bit architectures and you didn't fix an off by one bug—right I never mentioned it because it's a minor bug that didn't really matter until now. I guess it still doesn't really matter. Anyway looking forward to your ChaCha8 implementation.
from srandom.
OK, now I'm going to need a bit of help... I want to make sure this is solid. I found an implementation of chacha20 that I was able to put into kernel code. It seems to work, but runs about 80MB/s slower than urandom. (I guess this would be because it's using the misc-device and not optimized).
If anyone here is able to convert the chacha20 to chacha8, it would be appreciated... (and fix the bug, you just pointed out)
I'm assuming once chacha20 is modified to chacha8, we should get close to the goal speeds.
from srandom.
OK, I think I figured it out... I found here a reference; Perform the ChaCha rounds in sets of two, Column round and Diagonal round
So, I just need to simply change line 675 from 10
to 4
, to make it from chacha20 to chacha8.
https://github.com/josenk/srandom/blob/v2beta/srandom.c#L675
from srandom.
Hey so I ran into an issue with Linux and fixed it. There is still one small thing, apparently a compile feature in GCC is C++ only: __attribute__((target("...")))
.
I can add the runtime test for which instructions sets are available but the number of files goes from 1 to 7 because each file needs to be compiled with -msse2
, -mssse3
, etc. Unless you know of a better way to do this. I think it can still be 1 file if there are defines like __AVX2__
for all of these SSE2, SSSE3, AVX, XOP, AVX2, and AVX-512. But I don't know how to detect those features in a makefile and add the appropriate flags.
I'll try to figure something out tomorrow. Also I wasn't able to test XOP or AVX-512.
from srandom.
There are defines for cpu flags...
Is this what you're looking for??? Creating compile time option flags for the detected cpu flags? (This list is not complete and is just for an example)
OPTS += $(shell cat /proc/cpuinfo |grep -q "avx " && echo "-mavx")
OPTS += $(shell cat /proc/cpuinfo |grep -q "avx2" && echo "-mavx2")
OPTS += $(shell cat /proc/cpuinfo |grep -q "avx512" && echo "-mavx512")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse " && echo "-msse")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse2" && echo "-msse2")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse3" && echo "-msse3")
OPTS += $(shell cat /proc/cpuinfo |grep -q "sse4" && echo "-msse4")
from srandom.
Scoobz How are things working out?
from srandom.
2.x has been released, so closing this issue.
from srandom.
Related Issues (13)
- Fix installation for systems without /etc/modules-load.d/ (for example Debian)
- Add support for USB true hardware RNG. HOT 1
- Not random on my system HOT 9
- Is srandom safe to generate large encryption keys? HOT 7
- Crashes whole system on Raspbery PI 4 with debian Buster HOT 15
- Cannot login via GDM after install srandom HOT 2
- Module srandom increases system load by 1 HOT 2
- Multithreaded implementation: Make srandom the hyperfastest /dev/urandom PRNG available HOT 13
- Multithreading, or to get greater than 2-3GiB/s performance? HOT 11
- do_exit is no longer exported HOT 8
- srandom is not secure (README.md bugs) HOT 9
- Help me to implement this module on android HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from srandom.