Git Product home page Git Product logo

minions's Introduction

Minions

Build Status

TL;DR

Minions is a filesystem-based, microservice oriented security scanner. It supports distributed security checks and isolates testing and data access via gRPC, can be easily extended and is privacy mindful.

High level schema of Minions

Status

We are actively opensourcing existing code and building new one, but the project is yet to hit the first full release (0.1).

Full roadmap here.

Why does this project matter

Unlike traditional on-host security scanners, Minions minimizes the amount of code that needs to be executed on the target, and it's very easy to implement a new Goblin for a specific environment. All the complex logic is in the Minions, and users can maintain control of what goes where by running their own Goblin and Overlord.

Minions (scanners) also easily supports non-public scanners: adding a new tester using custom technology is as easy as implementing a well defined gRPC API.

Minions is not thought to be a full end to end solution on its own: there is no fancy UI, nor dashboards. It will, however, generate accurate findings that you can ingest in any other system, quickly and at scale. It's likely most useful if you run a large infrastructure.

Getting started

You can try the project by running everything on your local box.

  1. Install the latest version of bazel. There are handy packages for most platforms.
  2. Check out the project.
  3. Run the backend scanning services locally via the execute_local.sh bash script.
  4. Scan your local machine by running, in the src directory
bazel run //go/goblins -- --overlord_addr=localhost:10001

Core concepts

Much like ancient Gaul, a Minions infrastructure is divided in three main components: the Goblins, the Overlord and the Minions.

A Goblin is responsible for data access: it reads filesystem data and metadata and makes it available to the scanners. A Goblin is entirely independent from the rest of the scanning infrastructure, and as such can take privacy preserving decisions: for example, never let the scanners access a Home directory, or source code.

A Minion proper (I know, the project is also called Minions) is the actual scanner. It receives file data and metadata, does whatever it needs to and returns back - if any were found - vulnerabilities. A minion has only as much context as it specifically asks for about the target of a scan (more about this below). This allows Minions to be laser-focused on the tasks of detecting vulnerabilities, without all the classic overhead that comes with any, even trivial, scanner.

Finally, the Overlord is the orchestrator of the infrastructure, in charge of managing incoming scan requests, routing them to Minions and so forth.

Interests

Separating data-gathering and actual testing of the data seems like a good idea on paper, but in practice has (at least) two main problems:

  • It can be unnecessarily very expensive, as a lot of data that might or might not be useful need to be gathered upfront.
  • The set of data to gather might be different depending on properties of the data itself. For example, a config file at a standard location might point to another configuration file in some directory hidden in a dark corner of the disk.

Minions solves this problem with the use of Interests. An interest is a way for a Minion to tell a Goblin what it cares about at a given moment. All Minion instances start with a set of initial Interests they always care about, but the list is iteratively updated as they process files they have ingested.

THe way this works in practice is that every time a Goblin sends files to an Overlord, it waits until the backend Minions have processed it and can be served back a new list of files to provide - and so forth until all minions have completed.

Building and running

Minions is a set of microservices. You'll have to run at least 2 components to get aything useful: an Overlord, and one or more Minions.

Minion

Start by running one or more Minions - you can run as many replicas of a minion as you want, spreading the load.

Each Minion carries its own set of flags and configs, but all need to be pointed to have the Overlord pointed at them, so they should be the first thing to start up.

Minion have a runner package that can be used to execute them. Assuming you want to run the vulners Minion, you'd use the following, which would start up the minion on localhost and port 20001.

  bazel build //go/minions/vulners/runner
  ./bazel-bin/go/minions/vulners/runner/linux_amd64_stripped/runner

Replicas

If you run more than a single replica of a minion, and if a minion keeps state, you'll want to have a shared backend.

TODO(paradoxengine): explain how one can know and what to do :)

Overlord

Once a minion is running, you can start the overlord, the orchestrator of the system.

The Overlord expects to be told where its minions are. Today, this is done simply by specifying as a flag the address of said minions. The overlord will then register with them and get ready to serve data.

Assuming you have a minion running on localhost port 20001 (the default when you run one), you'd start the Overlord as follows.

  bazel build //go/overlord/runner
  ./bazel-bin/go/overlord/runner/linux_amd64_stripped/Crunner --minions=localhost:20001

If you have more minions, just add more --minions flags.

Minion details: Vulners

The Vulners minion parses package databases on Linux systems to identify the presence of outdated software that carries security vulnerabilities. To do so it needs to parse the RPM backend - which it does using the RPM libraries. Sadly, this means that building it is non hermetic, as the system will have to provide the rpm lib. On Debian/Ubuntu system, that means you need to make sure you have the librpm-dev package installed to build it.

Goblin

Once you have the address of an Overlord with a set of working Minions you can run a Goblin to feed data to it.

The simplest Goblin available is the Local Goblin, which fetches files from the filesystem of the box it runs on. To run it, enter the main src directory and run the following:

bazel run //go/goblins:goblins

Developers

We warmly welcome contributions, in particular of additional detectors (which are hopefully fairly easy to write once you get the hang of the APIs). Please read the contributing policy first.

Build environment

Minions has been developed using Bazel, an opensource build infrastructure by Google. Bazel can compile multiple languages using a common idiom cross platform, which is a nice property to have for Minions.

The Go code also builds and runs with the native compiler. In fact, one can have both working at the same time - which is particularly useful if one wants to develop with something like VS Code - with 2 tricks:

  • Have a symlink from somewhere in the gopath from src/github.com/google/minions to the src directory where the code is checked out.
  • Set gopath to include /src/bazel-bin/gopath, which is where the go dependencies will be copied by gopath (see below)

Now, simply blaze building the gopath target:

bazel build //:gopath

Notice of affiliation

This is not an official Google product.

minions's People

Contributors

empijei avatar paradoxengine avatar whuang8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minions's Issues

Build a overlord-minions caching layer

There is no point in sending to a minion the same file over and over.
The overlord can simply remember some facts about a file (say, metadata-backed hashes) and decide not to send over to a minion the file for analysis, simply returning the data.

Design notes:

  • What is the minimum DNA required here? hashing will take care of "data" (file contents) but clearly metadata might play a role in the findings (e.g. executable bit, file position in the disk etc).
  • Metadata-only files might still benefit from caching (e.g. malware-scanner minions looking for specific hashes).
  • Down the line, we might want to have minions specify their own "caching rules" telling the overlord what they can cache.

Minor issues in overlord

This is just some comments from an initial general review

Package structure

Packages should be named after their functions but the directory tree should reflect dependencies. The overlord/interests package is used by packages outside overlord. Since it is a shared util it might be worth considering moving it one level up to highlight what is shared across packages.

Overlord

Naming convention

File: ./overlord/overlord_test.go: underscores in names are discouraged.

Implement a deadline

Line: ./overlord/overlord.go:116: // TODO(paradoxengine): most likely, a deadline here?

A dummy code (untested) to do it

	var interests []*mappedInterest
	for name, m := range minions {
		// TODO(paradoxengine): most likely, a deadline here?
		var intResp *pb.ListInitialInterestsResponse
		var err error
		done := make(chan struct{})
		go func(){
			intResp, err = m.ListInitialInterests(ctx, &mpb.ListInitialInterestsRequest{})
			close(done)
		}()
		select {
			case <- time.After(5*time.Second):
				// handle timeout
				continue
			case <- done:
		}
		if err != nil {
			return nil, err
		}
		for _, v := range intResp.GetInterests() {
			interests = append(interests, &mappedInterest{
				interest: v,
				minion:   name,
			})
		}
	}
	return interests, nil

copy is a shallow copy

Line: ./overlord/overlord.go:143: copy(scanState.interests, s.initialInterests)

Beware this is a shallow copy and might behave weirdly if the copied slice elements are then modified

Potential race condition

grpc.Serve might spawn several goroutines at the same time, so it might be worth to protect some calls with a mutex (add a --gotsan or --race flag during tests to enable the race detector)

For example the following field is both written and read after server initialization:
Line: ./overlord/overlord.go:34: scans map[string]state

Reads:
overlord/overlord.go:161: scan, ok := s.scans[req.GetScanId()]
overlord/overlord.go:176: scan, ok := s.scans[req.GetScanId()]

Write:
overlord/overlord.go:150: s.scans[scan.ScanId] = scanState

Native go maps are not concurrent safe, maybe protect it with a sync.RWMutex or use a sync.Map instead.

Nits:

Line:./overlord/overlord_test.go:90:// TOOD(paradoxengine): the overlord still needs plenty of unit tests typo on TODO.
Line:./overlord/overlord_test.go:90:// TOOD(paradoxengine): the overlord still needs plenty of unit tests maybe take a look at the cover tool.

Vulners API auto-throttling

Hey-ho!
We've faced some kind of issues with projects using our API.
Like trying to make a 1000rps to our backend using our official Python API :)
So I've made auto-throttling mechanics using two HTTP headers at the backend and some kind of semaphore in the Python API acting on them.
In response from the backend you can see:
X-Vulners-Ratelimit-Reqlimit = This headers tells client about ratelimit. Float.
X-Vulners-Ratelimit-Rate = This headers tells client his current rate. Float.
So avoiding banhammer is quite easy - just respect this values and throttle down if the end is near :)

I've added throttling functionality in this commit.
May I suggest you to follow it in Go lib? It's not urgent, but a definitely good stuff :)

Build an npm scanner

Extracting installed npms should be relatively simple, and then they can be checked for vulnerabilities.
package.json is the obvious starting point,

npm list -g --depth=0 is the canonical way to get the view from a command line, but also subdirs of node_modules should work.

We'd have to figure out how to identify vulnerable ones, but there is a fairly large ecosystem for that as well.

Add a minion to scan redis configuration

Seems like a configuration scanner would be a good idea, and redis might be a good start.
We could check for:

  • world readable config file (since it stores the password)
  • requirepass (maybe in conjunction with being bound to something other than localhost)
  • presence of rename command for CONFIG (though, this seems like hardening?)

Build a Vulners-RPM minion

Turns out that Berkeley DB (which is what the RPM db is stored in) from Go is surprisingly hard.
It seems reasonable in Java instead (last famous words).
It's a good excuse to showcase using different languages for minions that interop with each other, so it might be worth building a Java minion to support that, though it means re-implementing the vulners apis etc, but since I've done it already it should be reasonably easy.

Add hipster definitions

Apparently you can't be a respectable project without at least the following:

Roadmap.md
Code-of-conduct.md
bill-of-materials.json
lgtm config
travis config

Build a Virus-Total minion

Sending binaries to virus total seems like a reasonable thing to do.

The public API is documented here: https://www.virustotal.com/es/documentation/public-api/
The first scan is bound to take a long while as the minion builds a cache of "harmless files" with all the binaries found in a healthy OS, but after that it should be reasonably fast.
I wonder if there is a way to optimize this entirely by having some form of pre-check on the hashes of well known good binaries - it seems a trivial enough idea that I'm sure it must exist already - otherwise, maybe it would be an interesting project to just download all the rpms/apts and generate hashes of those binaries?

Of course, this plays very nicely with the idea of controlling the scope of goblins re the binaries they serve.

Build a jenkins goblin

Jenkins pipelines seem to be a good first use case for an actually useful Goblin.
It could be used to scan a Docker image for example

Vulners minion: handle multiple os-release files

In modern ubuntu systems it's fairly common to have multiple os-release files in snaps.
These can be selected by goblins, so either we need to lock down the regexp or we need to handle multiple submissions of os-release (possibly by making multiple calls with the various values, if they disagree)

Add a minion to scan for PHP misconfiguration

An endless classic, PHP has all sort of potential configuration woes

Allow remote opens, globals (though this has luckily gone away) and an array of other things - we can probably check a hardening guide to make sure we're not missing anything major.

It's important to keep in mind we should only flag things that have a reasonable chance of being a real problem, not just far-fetched hardening improvements.

no such package '@com_google_protobuf//'

I have tried to build an overlord:
$ bazel build //go/overlord/runner
then, I got the ERROR:

ERROR: /home/user/goworkspace/src/github.com/google/minions/src/proto/overlord/BUILD.bazel:4:1: every rule of type proto_library implicitly depends upon the target '@com_google_protobuf//:protoc', but this target could not be found because of: no such package '@com_google_protobuf//': java.io.IOException: thread interrupted
ERROR: Analysis of target '//go/overlord/runner:runner' failed; build aborted: no such package '@com_google_protobuf//': java.io.IOException: thread interrupted
INFO: Elapsed time: 677.133s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (42 packages loaded)

Build a container goblin

The minimum usable system goes through a goblin capable of handling container images.
I might want to start from raw dockers or something integrated, like spinnaker. TBD.

Handle symlinks in goblins

Goblins are currently pretty dumb when it comes to symlinks, and we need a good strategy to handle that.
In particular I'm thinking about the "fake" chroot that using the "rootdir" directive provides, which is trivially broken by symlinks.

Migrate to logrus our logging

There is little point in not doing this since day zero, so we might as well migrate to LogRus while we're at it.
https://github.com/sirupsen/logrus

I've looked briefly at other structured logging libraries (it's a bit sad there is no clear standard in Go yet) and this seems the best option

Optimize local goblin by path search

The local goblin currently searches the entire hard drive, but really it should look at prefixes of the regexp for the interests and bail out if the prefix (e.g. the root directory) does not match.

In addition it should only parse the filesystem once, looking at all the interests it knows about at a given point - after all, the overlord will take care of routing the files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.