Git Product home page Git Product logo

Comments (17)

cppforlife avatar cppforlife commented on August 10, 2024

from bosh-linux-stemcell-builder.

evandbrown avatar evandbrown commented on August 10, 2024

@alexwo do we know why performance is better on AWS by default? Is it possible that the AWS stemcell is installing rng-tools already? Are you using the same stemcell when comparing between cloud providers?

@cppforlife is there a reason we shouldn't just bake this in via one of the Google stages?

from bosh-linux-stemcell-builder.

bkrannich avatar bkrannich commented on August 10, 2024

Pinging @voelzmo as well.

from bosh-linux-stemcell-builder.

metron2 avatar metron2 commented on August 10, 2024

Running it as a daemon in a stemcell isn't ideal. The service performs FIPS validation of the entropy source and occasionally will kill itself as a result.
https://github.com/nhorman/rng-tools/blob/master/rngd.c#L345
Because of this you either have to set ignorefail or add it as a bosh addon so it's monitored.

from bosh-linux-stemcell-builder.

xoebus avatar xoebus commented on August 10, 2024

I would prefer that this was in the stemcell on the IaaSes which support this.

The entropy pool will be at its smallest when the machine boots up. This means that the agent and other services which run at boot are the most at risk from uninitialized reads. Running this as an add-on would cause the benefits of this to only happen once we don't really need them anymore. Stemcells are particularly at risk because they are often recreated rather than rebooted so no entropy can be kept from previous shutdowns.

It might be enough to feed some /dev/hwrng (if present) into /dev/random on boot rather than installing a service. Applications can then read random bytes or seed themselves from /dev/urandom or get_random(2).

from bosh-linux-stemcell-builder.

metron2 avatar metron2 commented on August 10, 2024

Every public IaaS supports RDRAND, although you need hardware version 9 for ESXi to pass it through. It's available on Ivy Bridge and newer CPUs on intels. AMD added support in June 2015. ARM may not have it. I think VIA CPUs have it.

To check, you can look in /proc

cat /proc/cpuinfo | grep rdrand

Since RDRAND is real hardware entropy it is preferable to a CSPRNG like /dev/urandom. Having rngd running ensures that your CSPRNG internal state is constantly being modified. Attacks against CSPRNG implementations work by observing the output and guessing / calculating the state.

https://stackoverflow.com/questions/26771329/is-there-any-legitimate-use-for-intels-rdrand
https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide

Looking at the weaknesses of PRNG - https://www.schneier.com/academic/paperfiles/paper-prngs.pdf

Assuming a CSPRNG is merely applying recommendation 1 (and maybe 2) from 4.1 to a PRNG, it's much better to look at 3 & 4 as well. Feeding TRRNG into the kernel random pool is a great way to accomplish this.

If you include rng-tools then in the worst case if there is no /dev/hwrng device rng-tools will not start. The problem we've observed with long running stemcells making heavy use of /dev/random is that you start to get failures and the daemon stops.

rng-tools tends to start very early since it doesn't need the network - just the /dev/hwrng device.

from bosh-linux-stemcell-builder.

xoebus avatar xoebus commented on August 10, 2024

Since RDRAND is real hardware entropy it is preferable to a CSPRNG like /dev/urandom.

Applications and libraries are going to read from /dev/random and /dev/urandom whether we want them to or not. Can we use rngd to feed entropy from the HWRNG into the kernel pools so we can help everyone?

from bosh-linux-stemcell-builder.

mlmitch avatar mlmitch commented on August 10, 2024

I think you and @metron2 agree on that.

If you include rng-tools then in the worst case if there is no /dev/hwrng device rng-tools will not start. The problem we've observed with long running stemcells making heavy use of /dev/random is that you start to get failures and the daemon stops.

I think that is where you differ. He's saying that rng-tools will just fail if there isn't enough hardware entropy and won't restart. He wants it as a bosh add on so there is monitoring and restarting capability.

from bosh-linux-stemcell-builder.

xoebus avatar xoebus commented on August 10, 2024

The init system on the stemcell will make sure to keep it alive but I'm assuming that you'd like to be alerted that the daemon has died so that you can investigate why? That makes a lot of sense.

I'm worried about the quality of the entropy on stemcells just after boot. This wouldn't be affected by starting rngd as an addon because it would be starting way after all the system services and the agent. Is there a way we can both start it as soon as possible and also get alerting when it fails?

from bosh-linux-stemcell-builder.

xoebus avatar xoebus commented on August 10, 2024

/cc @rohitkhera

from bosh-linux-stemcell-builder.

metron2 avatar metron2 commented on August 10, 2024

There is probably a way to get it to restart but it doesn't out of the box with ubuntu 14.04. When the daemon fails you have to manually run 'service rng-tools start' to start it again. Our workload hung up without the entropy and we decided to do a bosh-addon instead.

from bosh-linux-stemcell-builder.

rohitkhera avatar rohitkhera commented on August 10, 2024

Cross posting from email -

Wanted to jump in a minute to clarify the issue and expose some issues for folks operating in regulated environments or other environments that may require high levels of review.

Notionally, there's a difference between the Linux RNG's (LRNG) entropy estimation heuristic described in section 2.4 https://eprint.iacr.org/2006/086.pdf and more orthodox min or shannon entropy favored by NIST, this real distinction often leads down ratholes, so for simplicity lets assume that the LRNG heuristic is as conservative as the "min-entropy" of the system. Though the total LRNG is very much a hybrid system in the spirit of constructions defined in this draft https://csrc.nist.gov/csrc/media/publications/sp/800-90c/draft/documents/draft-sp800-90c.pdf lets for simplicity assume that the urandom pool acts as a DRBG (i.e. there is some pseudorandom / deterministic aspect and its not just a noise source) , for eg. the mixing function cyclically generates elements in an extension field based on polynomials primitive in GF(2^32). Also lets say that urandom is secure in the sense that output hashing provides backtracking resistance whereas constant entropy refresh into the pool provides prediction resistance ( I have attempted to provide more information here https://ringcipher.com/2013/05/04/uncertainty-randomness-virtualization/

The nist guidance around instantiation strength is based on a set of criteria, instantiation strength of the hash_DBRG, for instance, is defined by the pre-image resistance of the hash function. If I wish to generate a 256 AES key with this DRBG, I need to minimally use SHA-256. Correct instantiation also requires that the seed for the DRBG minimally have entropy equal to the instantiation strength, so going with the 256 bit AES key example, the entropy of the seed should minimally be 256 bits. If the entropy of the seed is 2^60 bits for example, then this opens up the possibility of finding the AES key in 2^60 work which would have the effect of reducing the strength of the AES key. This is mostly well known, I'm repeating it here since I want to make sure the issue is outlined accurately.

We have known for a while that the LRNG acquires entropy slower in virtual environments, pg.129 of this BSI study https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/ZufallinVMS/Randomness-in-VMs.pdffor eg notes about 33 seconds (past boot) for 128 bits of entropy accumulation on KVM, whereas the system and userland daemons such as openssh boot in about 2 secs - so this is a problem. From what I can tell getrandom() goes to the LRNG pools so it does not alleviate the problem and as Chris has stated "Applications and libraries are going to read from /dev/random and /dev/urandom whether we want them to or not". Concerning the remark from the issue "Can take 20-30 minutes to complete on GCP and 2-5 minutes on Azure and AWS", it take this to mean that in these environments, it takes longer for the LRNG to acquire 128 bits of entropy compared to what is described in the BSI study.

Also, switching away from the above discussion, the issue has a code sample around use of SecureRandom in java.Security. There is some confusion around how this mechanism should work since SecureRandom provides a generic interface and the underlying source could be the LRNG or the SHA1-PRNG etc. . For an app to constantly seek randomness from the LRNG is not necessary - if Secure Random is used to instantiate a DRBG, the LRNG pools should be used at the start and end of some finite "re-seed interval" for the DRBG, during DRBG instantiation, or in the event that the DRBG state becomes known - the recommended way is to correctly seed a NIST 800-90Ar1 DRBG where the entropy of the seed is commensurate with the instantiation strength of the DRBG. Its not clear that the app is doing this, it is possible in Java 9, for eg. through the java.security.DrbgParameters

from bosh-linux-stemcell-builder.

mlmitch avatar mlmitch commented on August 10, 2024

I think @metron2 is talking about the "rngd too many fips failures" error. This causes the rngd daemon to stop and not come back.

The daemon can be started with the --ignore option, which would prevent this scenario. This may be a problem for users that require FIPS verification. It will be a big improvement over nothing though.

Haveged could also be installed for the edge case of machines that don't have the necessary hardware support. There is contention that this actually produces good randomness. However, the haveged documentation seems to imply it won't hurt randomness if rng-tools is in use. Seems to be another "better than nothing" scenario.

I think there is enough agreement for the following actions to be taken:

  • Ship the base stemcell with rng-tools and haveged installed
  • Ensure the rngd daemon is started with the --ignore option

Also, for your information, there is a similar open request for kubernetes.

from bosh-linux-stemcell-builder.

CAFxX avatar CAFxX commented on August 10, 2024

Indipendently we were working on a haveged boshrelease to solve part of the same problem.

from bosh-linux-stemcell-builder.

metron2 avatar metron2 commented on August 10, 2024

It seems like this got forgotten, so I opened a PR with the necessary changes to add this to the ubuntu base image. It's already included in centos. #113

from bosh-linux-stemcell-builder.

voelzmo avatar voelzmo commented on August 10, 2024

Yeah, I think this was pretty much dropped when xenial stemcells were released, which offered a much better pre-filled rng, such that issues with randomness (especially during startup and secret generation) were no longer that big of an issue anymore.
Happy to finally see this moving forward!

from bosh-linux-stemcell-builder.

metron2 avatar metron2 commented on August 10, 2024

This can be closed now that #134 and #113 are merged.

from bosh-linux-stemcell-builder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.