racksec / desdemona Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 7.0 1.11 MB

Data-backed security operations

License: Eclipse Public License 1.0

Clojure 33.55% Shell 0.50% CSS 65.72% Yacc 0.13% HTML 0.10%

desdemona's People

Contributors

Stargazers

Watchers

Forkers

reaperhulk lvh sirsean ehashman smogg racker-aswa1567 j9chan

desdemona's Issues

Dashboard MVP

#51

Dashboard is where visualizations live. For an MVP we should have one dashboard available to the user.
User shuld be able to:

rearrange/add/remove visualizations
click on visualizations to dig deeper into the data

Raw syslog ingestion

We need to be able to ingest generic syslog log data.

FalconHose ingestion

We need to ingest alerts (chug?) from the FalconHose.

Improve template-generated instructions

As @reaperhulk pointed out, this:

Load up env/dev/user.clj. Evaluate the go function.

... is not a very useful way to help newbies get started.

docker-compose isn't avialable in current container-based infra

See: https://travis-ci.org/RackSec/desdemona/jobs/113481387#L279

Thanks to @ehashman for finding this.

Alert Logic ingestion

We need to ingest alerts from Alert Logic.

Add support for more Karma launchers

Currently only Firefox is supported because that was the only one that was easy to get in Travis. However, we should really also run the tests in Safari, Chrome and maybe even Opera. This is well-supported by Karma; we just need to wrangle

Add appropriate npm Karma plugins
Add Travis support for testing with Safari on OS X
Figure out which platform it's easiest to get Chrome and Opera on (it may be Travis' OS X builders) and run it there

Default docker-machine has insufficient RAM to set up docker-compose

Currently, the Kafka JVM attempts to pre-allocate 1024MB all by itself. This is problematic because the standard docker-machine VM only has 1024MB total, so the malloc fails, so the Kafka container falls over immediately. This is unrelated to #45 as far as I can tell (other than that it's also about Kafka).

lvh@zygalski ~/P/r/desdemona (master)> docker-machine create --driver virtualbox rax2                                                                                                               09:18:22
Running pre-create checks...
(rax2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(rax2) Latest release for github.com/boot2docker/boot2docker is v1.10.2
(rax2) Downloading /Users/lvh/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v1.10.2/boot2docker.iso...
(rax2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(rax2) Copying /Users/lvh/.docker/machine/cache/boot2docker.iso to /Users/lvh/.docker/machine/machines/rax2/boot2docker.iso...
(rax2) Creating VirtualBox VM...
(rax2) Creating SSH key...
(rax2) Starting the VM...
(rax2) Check network to re-create if needed...
(rax2) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env rax2
lvh@zygalski ~/P/r/desdemona (master)> VboxManage showvminfo rax2 | grep Memory                                                                                                                     09:20:52
Memory size:     1024MB

One way to mitigate this immediately would be to document it so that people can just adjust their VM. Medium-term it would make sense to get it to try and allocate less memory, since I'm assuming lots of people will be using the default. Longer-term it maybe useful to make that amount of memory configurable; I could certainly see how a production setup would want more than 1024MB.

Clean up template-generated code, improve test coverage to 90%+

Some of the template-generated code is quite duplicated. Some if it is not covered by tests and is only intended for REPL-use, resulting in lower coverage (at time of writing, 75%).

Add figwheel

We need live style injection and some css preprocessor (preferably SASS) compilation for easier front-end development.
Sass compilation should be easy to implement:
https://github.com/bhauman/lein-figwheel/wiki/SASS-watcher

Figwheel
CSS Preprocessing

Backport unified config.edn from onyx-template

See https://github.com/onyx-platform/onyx-template/blob/master/src/leiningen/new/onyx_app/resources/config.edn

This has the advantage of having a single config.edn, more reliant on environment variables, rather than a number of different ones for different environments.

Add kibit support to CI

https://github.com/jonase/kibit

This is particularly useful because we have a lot of folks new to Clojure, and will help them learn idioms more quickly. That can also be done through review, but using kibit instead means optimized reviewer time.

docker-compose output from individual containers should be part of CI output

Right now, docker-compose is called with the -d (daemonize) flag. This prevents output from individual containers, e.g.:

test_1      |
test_1      | lein test desdemona.functions.sample-functions-test
test_1      |
test_1      | lein test desdemona.jobs.sample-job-test
test_1      |
test_1      | lein test desdemona.query-test
test_1      |
test_1      | lein test desdemona.tasks.kafka-test
test_1      |
test_1      | lein test desdemona.workflows.sample-workflow-test
test_1      |
test_1      | Ran 9 tests containing 23 assertions.
test_1      | 0 failures, 0 errors.
desdemona_test_1 exited with code 0

... from being displayed. That seems like incredibly useful information, and it should be part of the logs.

One way to do this might be to just docker-compose &, but it's not clear how useful that is with fd redirects + Travis.

CloudPassage ingestion

We need to ingest data from CloudPassage.

Add schema to verify queries before compiling and executing them

Right now, query execution is quite literally arbitrary code execution. We should have a schema that specifies what queries are actually allowed.

Add development guidelines

Desdemona should have a DEVELOPING.md that includes:

Opinions on how to do software development (CI, issue tracker, PRs &c)
Places to get in touch with the other people developing it

Compile map-entry unification to featurec

Related to #17. The spike added in #27 requires the input query to know about featurec, which is probably a layer of abstraction below what we want to ask of our users. The original query I wrote when I wrote the test was:

'(== (:ip x) "10.0.0.1")

and the thing I changed it to was:

'(l/featurec x {:ip "10.0.0.1"})

The latter is obviously the correct thing from core.logic's perspective; the former isn't actually relational.

Add fake data storage for the development environment

Right now, the only way to do archival is Rackspace Cloud Files (see #2). That's unfortunate: it means that a development environment can't test those components without access to the network, and access to a specific service at that. It would be much preferable if you could just docker compose up (#7) and have that do a thing all by itself. Simply making long-term storage unavailable is unacceptable: it is critical to how desdemona operates, so that would make a development environment unrealistic.

There are a few potential ways to resolve this:

Abstract away the storage API, provide a simple (in-memory?) alternative to Cloud Files
Provide fakes for Cloud Files

Both have interesting trade-offs. Plan 1 is obviously nice because it opens up the possibility of other storage backends. If we go with plan 1, most of the work in plan 2 should still happen, albeit with a potentially lower priority: plan 2 is how you provide high-assurance testing for the Rackspace Cloud Files backend.

Plan 2 has a number of parts:

The upstream project mimic needs to learn how to speak OpenStack Swift/Rackspace Cloud Files. Fortunately mimic has a friendly and active development environment, so adding features to it should be fairly painless.
The Rackspace Cloud Files jClouds API may or may not be OK with a different identity API URL, which is how Mimic expects to work.

It's not clear if this should be done before or after #2, which is about adding support for Rackspace Cloud Files altogether.

Docker environment doesn't have npm, breaking the build

Right now, docker build (and by extension docker-compose build) fails because lein deps tries to install all of the dependencies. That includes karma + plugins, recently introduced for Clojurescript.

Since the npm packages are only required for testing, it's probably best to just make sure that plain-old lein deps doesn't end up trying to use it. This would just mean splitting up project.clj into several profiles.

This was not evident from CI because docker-compose build is currently non-gating. It's non-gating because of #45.

Add performance metrics for tasks

It would be useful to know how long steps take. Onyx has native support for this.

Kibit gates on two-form thread exprs (->, ->>) which is not always desirable

Kibit has rules about two-form thread exprs. It complains about things like (->> f a) and (-> f a) and tells you to rewrite them as (f a). That certainly looks reasonable, but sometimes that is really the treading macro you want. For example, consider manifold.deferred's catch, which tells you to:

(-> (form that throws)
    (md/catch Exception (handler))

That's what upstream thinks looks nicest, and there appears to be consensus on our development team as well (@derwolfe, @lvh). The desugared form, (md/catch (form that throws) Exception (handler)), while obviously equivalent, is harder to read and understand; you want to think about what you're doing first before you think about error handling. The problem goes away when you add more fns to that pipeline; then kibit is happy with the form.

Potential solutions:

By the time we get to this ticket, someone has already solved this somehow in kibit (check!)
Amend kibit to conditionally enable/disable some checks, or make some of them non-gating. This potentially adds a lot of complexity to kibit, though; and we don't know if upstream agrees that that's a good thing.
Amend kibit to maintain a list of exceptions to that rule, i.e. it's OK if it's md/catch. This has limitations; AFAIK kibit doesn't really import your code so it has to count on statically resolvable things to figure out what a particular fn is; and, right now, kibit works in a "peephole" fashion (that is, it only looks at a form at a time, so it would e.g. not be easy or pleasant to also look at an ns form).
Workaround: assign the "happy case" deferred to a name in a let binding (not very good) or deal with the inside-out form (not very good).

Get rid of the explicit namespace prefix for querying relations

#27 introduces something that runs core.logic. The query it introduces still specifies the namespace for core.logic explicitly, e.g. specifying l/featurec as the relation. However, that is a pure accident of how core.logic is specified in desdemona.querys :require. This may be a thing #29 solves by accident by introducing a compiler.

Flag outdated dependencies as non-gating CI

Enforce whitespace as part of CI

Right now, some files (e.g. utils.clj) had lines ending in whitespace. This was not caught by CI (kbiit and eastwood rely on post-reader data structures, so they wouldn't notice; neither did cljfmt). I noticed this while trying to get the codebase in a state that kibit would pass.

Add logical conjunction (and) to the query language

This is part of the requirements set out in #17. It should probably be done after #29, since that will give us something to conjunct, and mean that we have a compiler to hook in to. It should probably be done after we have some other property to extract from segments besides IPs (although for testing this we can just pretend that's the case right now)

This should look something like:

(and (= (:ip x) "10.0.0.1") (= (:type x) "egress"))

This maps internally to a single conjunctive conde clause, presumably.

Kafka container prematurely exits about half of the time

We saw an example of this in our latest Travis build on my Docker PR (here), and I've also encountered this in my dev environment. We should track down what the problem is and fix it if possible.

Current workaround is to run docker-compose up again until the Kafka container doesn't die.

Add ClojureScript code coverage

All ClojureScript code is tested, but we don't measure our test coverage. This is primarily due to a limitation in cloverage. We should test both and merge the results.

Basic user interface

A user needs a graphic interface to be able to display and filter ingested data.

Optimize relational access to events

The spike introduced in #27 splices events into the form as a macro. This is unfortunate from a performance perspective for when the events list is large.

This can be done by writing a relation, or possibly we can just add the events to the eval context.

Add EditorConfig

EditorConfig appears to be the primary way you communicate how you want your files edited. It'd be nice if we had that in the repo and CONTRIBUTING.md mentioned its existence.

Document project architecture

It would be cool if someone would put together some documentation (that we could maintain) on the architecture of desdemona. Bonus points for diagrams?

Searching and filtering UI

Interface should support user when typing a query by:

Loading available keys and suggesting them to user as they type
Providing alternative methods of updating a query values based on types of data through proper UI elements (e.g. drag slider for numeric values)
Informing user (when possible) about results changing the query (if I click X, how many results will I get back?)
more commonly changed query parts should have their own place in the UI (e.g. timeframe dropdown)

Any interactions that can be translated into the query language, should be reflected in the query input.

Add logical disjunction (or) to the query language

This is part of the requirements set out in #17. It should probably be done after #29, since that will give us something to disjunct, and mean that we have a compiler to hook in to.

This should look something like:

(or (= (:ip x) "10.0.0.1") (= (:ip x) "10.0.0.2"))

This maps internally to conde, presumably.

Event stream MVP

Interface should display events available to the user based on currently applied query
User should be able to decide on which parts of the data are currently being displayed
interface should offer additional interactions allowing user to edit query directly from where events are displayed
Interface should allow user to access raw data
user should be able to select single event and get a unique link

Basic filtering

We need to be able to filter stored data per UI requests.

Lower the amount of memory the Kafka container allocates ahead of time to <<1G

This is related to #61. It may be nice to configure this since you may want the increased base amount of memory for production setups, but right now that's less pressing than the default docker-machine -> dev environment story being smooth and painless.

Separate "build" vs "run" containers

We currently have npm installed in our image, just for the build process (as of #78). But npm is not useful for actually running the software.

We should have separate "build" and "run" containers, such that the "run" container only includes what's necessary to run and not everything that's necessary to build.

Kafka ingestion

Desdemona should be able to ingest Onyx segments through Kafka. This gives us better a better separation of concerns for deliverability of e.g. syslog messages: syslog talks to Kafka (syslog-ng supports that), Kafka talks to Onyx which is also already supported.

Kafka should probably piggyback on the Zookeeper cluster which Onyx is already running for us.

Add gating CI for EditorConfig

Now that we have EditorConfig, we should enforce it. https://github.com/jedmao/eclint appears to be the best tool to do this.

Document architecture, design guidelines

Desdemona is obviously a complex piece of kit. We should document what we think it looks like, and what assumptions we're making in its design, to make sure we're all on the same page.

Normalization of ingested syslog data

We need to determine a standardized schema for all our ingested data from syslog.

In particular, we want to ensure that each log entry/alert (hereafter "event") is annotated with:

timestamp
the client generating the event
the customer id of the client generating the event
the type of event (e.g., Alert Logic, Falconhose...)
source IP/port
destination IP/port

Data visualization MVP

User should be able to save queries as visulizations for later use within dashboards.
At a very minimum, UI should:

support a few simple charts, a table, a markdown/WYSIWYG panel (not necessarily a data visualization, but lives here for lack of a better place)
let user name/edit/remove visualizations
let user add/remove visualizations from Dashboard

Implement query language

This ticket is more of a brain dump and a roadmap/central discussion point for figuring out which subtickets should exist to get the query language we need.

At a very minimum, a query language should allow unification and disunification with literals; e.g. "where the IP is 12.34.56.78" or "where the IP is not 12.34.56.78". It should support that on arbitrary properties.

A query language should also support (arbitrarily nested) conjunctions and disjunctions of terms, e.g. "where (the IP is p.q.r.s AND we've seen lots of traffic for it) OR (the IP is h.i.j.k and traffic to that IP is unusual)".

Queries must be validated by a schema, so that we don't accidentally expose arbitrary computation.

A useful query language should support logic variables. After all, as a SOC analyst, I don't just want to check for a particular IP or subnet; I want queries like "where the IP is one of these, and that IP is designated as a potential threat" without having to specify that set literally; I just want to specify that we're talking about the same IPs p, q... This doesn't have to be part of the initial implementation; it is a nice-to-have.

For training purposes, it would also be very nice to have the option of "explaining" a query, and potentially even making it bidirectionally writable (this is an idea @smogg has floated and made mockups for), where there is a nice editor with intelligent completions. This is obviously not an MVP feature, and a later nice-to-have that should have its own ticket.

Wherever possible, this code should be written as cljc files, so that a maximum amount can (at least in theory) run on both the browser and the client side.

The suggested implementation for this is core.logic. This has some advantages:

Natively supports all of the constraints above
Battle-tested (that is what ThreatGRID did); we know that this is something a very similar user group has figured out how to master
Allows programs to be run "backwards" i.e. produce samples that would have triggered this rule; useful for debugging and refining tests

Add eastwood to CI

https://github.com/jonase/eastwood

eastwood sometimes produces false positives, so perhaps it should not be gating. Keep in mind that eastwood does active analysis, i.e. it reads your code, so that means it is not okay for code to have import-time side effects.

Add figwheel nrepl middleware

Appropriate middleware might be:

["cider.nrepl/cider-middleware"
 "refactor-nrepl.middleware/wrap-refactor"
 "cemerick.piggieback/wrap-cljs-repl"]

Although it's not clear what happens when some of that's missing (e.g wrap-refactor). If we do this before #71 that will make #71 a bit easier since we won't have to document how to do (use 'figwheel-sidecar.repl-api) (figwheel-sidecar.repl-api/cljs-repl).

Use docker-compose

While desdemona is backed by Onyx, it necessarily has other requirements, including e.g. Kafka for ingestion. This is unlikely to be the last requirement. Once #2 is resolved, it would also be great to have e.g. a [mimic][mimic] running mimicking Rackspace Cloud Files. Finally, with Carina, it would be great if we can set up realistic demo/CI environments on some pretty big iron.

docker-compose gives us docker-quality repeatable environments that are easy to set up, and hopefully still enables production-grade setups.

As a side-effect of resolving this, we should consider having exactly 1 way to run things in development, mimicking the new features in onyx-template where possible. This means that docker-compose up should be the way you set up a development cluster, and setting up a production cluster should look as similar to that as possible.

featurec-equivalent = should be commutative

#29 (implemented by #36) lets you express:

(= (:ip x) "10.0.0.1")

... but:

(= "10.0.0.1" (:ip x))

... should be equivalent. This is not quite as trivial as it may look at first glance, since one might also say:

(= (:ip x) (:ip y))

... but that's a different thing, and should compile down differently: the latter can't just be expressed as a single featurec, but instead is a fresh variable and two featurec relations (one for x, one for y).

Read and write from Rackspace Cloud Files

Onyx currently has support for S3 in (but apparently not out). It would be great if we could have Rackspace Cloud Files support as a durable backend, since that gives us high-durability archiving with very low maintenance cost.

jClouds is how you talk to Rackspace Cloud from the JVM, and they have had plans to support Clojure explicitly in the past, that support has all but languished. I have, in the past, tried to make that better when I was working on rackerlabs/otter at https://github.com/lvh/nordschleife/blob/master/src/nordschleife/auto_scale.clj#L54-L80 which may be interesting to an implementor.

Document how to connect an editor to the Figwheel REPL

Support multiple logical variables when querying

Related to #17. The spike added in #27 just hardcodes x as the only logical variable. That shouldn't be hardcoded: it should be extracted from the query.