Git Product home page Git Product logo

desdemona's People

Contributors

derwolfe avatar ehashman avatar lvh avatar reaperhulk avatar sirsean avatar smogg avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

desdemona's Issues

Dashboard MVP

#51

Dashboard is where visualizations live. For an MVP we should have one dashboard available to the user.
User shuld be able to:

  • rearrange/add/remove visualizations
  • click on visualizations to dig deeper into the data

Add support for more Karma launchers

Currently only Firefox is supported because that was the only one that was easy to get in Travis. However, we should really also run the tests in Safari, Chrome and maybe even Opera. This is well-supported by Karma; we just need to wrangle

  • Add appropriate npm Karma plugins
  • Add Travis support for testing with Safari on OS X
  • Figure out which platform it's easiest to get Chrome and Opera on (it may be Travis' OS X builders) and run it there

Default docker-machine has insufficient RAM to set up docker-compose

Currently, the Kafka JVM attempts to pre-allocate 1024MB all by itself. This is problematic because the standard docker-machine VM only has 1024MB total, so the malloc fails, so the Kafka container falls over immediately. This is unrelated to #45 as far as I can tell (other than that it's also about Kafka).

lvh@zygalski ~/P/r/desdemona (master)> docker-machine create --driver virtualbox rax2                                                                                                               09:18:22
Running pre-create checks...
(rax2) Default Boot2Docker ISO is out-of-date, downloading the latest release...
(rax2) Latest release for github.com/boot2docker/boot2docker is v1.10.2
(rax2) Downloading /Users/lvh/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v1.10.2/boot2docker.iso...
(rax2) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(rax2) Copying /Users/lvh/.docker/machine/cache/boot2docker.iso to /Users/lvh/.docker/machine/machines/rax2/boot2docker.iso...
(rax2) Creating VirtualBox VM...
(rax2) Creating SSH key...
(rax2) Starting the VM...
(rax2) Check network to re-create if needed...
(rax2) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env rax2
lvh@zygalski ~/P/r/desdemona (master)> VboxManage showvminfo rax2 | grep Memory                                                                                                                     09:20:52
Memory size:     1024MB

One way to mitigate this immediately would be to document it so that people can just adjust their VM. Medium-term it would make sense to get it to try and allocate less memory, since I'm assuming lots of people will be using the default. Longer-term it maybe useful to make that amount of memory configurable; I could certainly see how a production setup would want more than 1024MB.

Add kibit support to CI

https://github.com/jonase/kibit

This is particularly useful because we have a lot of folks new to Clojure, and will help them learn idioms more quickly. That can also be done through review, but using kibit instead means optimized reviewer time.

docker-compose output from individual containers should be part of CI output

Right now, docker-compose is called with the -d (daemonize) flag. This prevents output from individual containers, e.g.:

test_1      |
test_1      | lein test desdemona.functions.sample-functions-test
test_1      |
test_1      | lein test desdemona.jobs.sample-job-test
test_1      |
test_1      | lein test desdemona.query-test
test_1      |
test_1      | lein test desdemona.tasks.kafka-test
test_1      |
test_1      | lein test desdemona.workflows.sample-workflow-test
test_1      |
test_1      | Ran 9 tests containing 23 assertions.
test_1      | 0 failures, 0 errors.
desdemona_test_1 exited with code 0

... from being displayed. That seems like incredibly useful information, and it should be part of the logs.

One way to do this might be to just docker-compose &, but it's not clear how useful that is with fd redirects + Travis.

Add development guidelines

Desdemona should have a DEVELOPING.md that includes:

  • Opinions on how to do software development (CI, issue tracker, PRs &c)
  • Places to get in touch with the other people developing it

Add fake data storage for the development environment

Right now, the only way to do archival is Rackspace Cloud Files (see #2). That's unfortunate: it means that a development environment can't test those components without access to the network, and access to a specific service at that. It would be much preferable if you could just docker compose up (#7) and have that do a thing all by itself. Simply making long-term storage unavailable is unacceptable: it is critical to how desdemona operates, so that would make a development environment unrealistic.

There are a few potential ways to resolve this:

  1. Abstract away the storage API, provide a simple (in-memory?) alternative to Cloud Files
  2. Provide fakes for Cloud Files

Both have interesting trade-offs. Plan 1 is obviously nice because it opens up the possibility of other storage backends. If we go with plan 1, most of the work in plan 2 should still happen, albeit with a potentially lower priority: plan 2 is how you provide high-assurance testing for the Rackspace Cloud Files backend.

Plan 2 has a number of parts:

  • The upstream project mimic needs to learn how to speak OpenStack Swift/Rackspace Cloud Files. Fortunately mimic has a friendly and active development environment, so adding features to it should be fairly painless.
  • The Rackspace Cloud Files jClouds API may or may not be OK with a different identity API URL, which is how Mimic expects to work.

It's not clear if this should be done before or after #2, which is about adding support for Rackspace Cloud Files altogether.

Docker environment doesn't have npm, breaking the build

Right now, docker build (and by extension docker-compose build) fails because lein deps tries to install all of the dependencies. That includes karma + plugins, recently introduced for Clojurescript.

Since the npm packages are only required for testing, it's probably best to just make sure that plain-old lein deps doesn't end up trying to use it. This would just mean splitting up project.clj into several profiles.

This was not evident from CI because docker-compose build is currently non-gating. It's non-gating because of #45.

Kibit gates on two-form thread exprs (->, ->>) which is not always desirable

Kibit has rules about two-form thread exprs. It complains about things like (->> f a) and (-> f a) and tells you to rewrite them as (f a). That certainly looks reasonable, but sometimes that is really the treading macro you want. For example, consider manifold.deferred's catch, which tells you to:

(-> (form that throws)
    (md/catch Exception (handler))

That's what upstream thinks looks nicest, and there appears to be consensus on our development team as well (@derwolfe, @lvh). The desugared form, (md/catch (form that throws) Exception (handler)), while obviously equivalent, is harder to read and understand; you want to think about what you're doing first before you think about error handling. The problem goes away when you add more fns to that pipeline; then kibit is happy with the form.

Potential solutions:

  • By the time we get to this ticket, someone has already solved this somehow in kibit (check!)
  • Amend kibit to conditionally enable/disable some checks, or make some of them non-gating. This potentially adds a lot of complexity to kibit, though; and we don't know if upstream agrees that that's a good thing.
  • Amend kibit to maintain a list of exceptions to that rule, i.e. it's OK if it's md/catch. This has limitations; AFAIK kibit doesn't really import your code so it has to count on statically resolvable things to figure out what a particular fn is; and, right now, kibit works in a "peephole" fashion (that is, it only looks at a form at a time, so it would e.g. not be easy or pleasant to also look at an ns form).
  • Workaround: assign the "happy case" deferred to a name in a let binding (not very good) or deal with the inside-out form (not very good).

Get rid of the explicit namespace prefix for querying relations

#27 introduces something that runs core.logic. The query it introduces still specifies the namespace for core.logic explicitly, e.g. specifying l/featurec as the relation. However, that is a pure accident of how core.logic is specified in desdemona.querys :require. This may be a thing #29 solves by accident by introducing a compiler.

Enforce whitespace as part of CI

Right now, some files (e.g. utils.clj) had lines ending in whitespace. This was not caught by CI (kbiit and eastwood rely on post-reader data structures, so they wouldn't notice; neither did cljfmt). I noticed this while trying to get the codebase in a state that kibit would pass.

Add logical conjunction (and) to the query language

This is part of the requirements set out in #17. It should probably be done after #29, since that will give us something to conjunct, and mean that we have a compiler to hook in to. It should probably be done after we have some other property to extract from segments besides IPs (although for testing this we can just pretend that's the case right now)

This should look something like:

(and (= (:ip x) "10.0.0.1") (= (:type x) "egress"))

This maps internally to a single conjunctive conde clause, presumably.

Kafka container prematurely exits about half of the time

We saw an example of this in our latest Travis build on my Docker PR (here), and I've also encountered this in my dev environment. We should track down what the problem is and fix it if possible.

Current workaround is to run docker-compose up again until the Kafka container doesn't die.

Add ClojureScript code coverage

All ClojureScript code is tested, but we don't measure our test coverage. This is primarily due to a limitation in cloverage. We should test both and merge the results.

Basic user interface

A user needs a graphic interface to be able to display and filter ingested data.

Optimize relational access to events

The spike introduced in #27 splices events into the form as a macro. This is unfortunate from a performance perspective for when the events list is large.

This can be done by writing a relation, or possibly we can just add the events to the eval context.

Add EditorConfig

EditorConfig appears to be the primary way you communicate how you want your files edited. It'd be nice if we had that in the repo and CONTRIBUTING.md mentioned its existence.

Document project architecture

It would be cool if someone would put together some documentation (that we could maintain) on the architecture of desdemona. Bonus points for diagrams?

Searching and filtering UI

Interface should support user when typing a query by:

  • Loading available keys and suggesting them to user as they type
  • Providing alternative methods of updating a query values based on types of data through proper UI elements (e.g. drag slider for numeric values)
  • Informing user (when possible) about results changing the query (if I click X, how many results will I get back?)
  • more commonly changed query parts should have their own place in the UI (e.g. timeframe dropdown)

Any interactions that can be translated into the query language, should be reflected in the query input.

Add logical disjunction (or) to the query language

This is part of the requirements set out in #17. It should probably be done after #29, since that will give us something to disjunct, and mean that we have a compiler to hook in to.

This should look something like:

(or (= (:ip x) "10.0.0.1") (= (:ip x) "10.0.0.2"))

This maps internally to conde, presumably.

Event stream MVP

  • Interface should display events available to the user based on currently applied query
  • User should be able to decide on which parts of the data are currently being displayed
  • interface should offer additional interactions allowing user to edit query directly from where events are displayed
  • Interface should allow user to access raw data
  • user should be able to select single event and get a unique link

Basic filtering

We need to be able to filter stored data per UI requests.

Separate "build" vs "run" containers

We currently have npm installed in our image, just for the build process (as of #78). But npm is not useful for actually running the software.

We should have separate "build" and "run" containers, such that the "run" container only includes what's necessary to run and not everything that's necessary to build.

Kafka ingestion

Desdemona should be able to ingest Onyx segments through Kafka. This gives us better a better separation of concerns for deliverability of e.g. syslog messages: syslog talks to Kafka (syslog-ng supports that), Kafka talks to Onyx which is also already supported.

Kafka should probably piggyback on the Zookeeper cluster which Onyx is already running for us.

Document architecture, design guidelines

Desdemona is obviously a complex piece of kit. We should document what we think it looks like, and what assumptions we're making in its design, to make sure we're all on the same page.

Normalization of ingested syslog data

We need to determine a standardized schema for all our ingested data from syslog.

In particular, we want to ensure that each log entry/alert (hereafter "event") is annotated with:

  • timestamp
  • the client generating the event
  • the customer id of the client generating the event
  • the type of event (e.g., Alert Logic, Falconhose...)
  • source IP/port
  • destination IP/port

Data visualization MVP

User should be able to save queries as visulizations for later use within dashboards.
At a very minimum, UI should:

  • support a few simple charts, a table, a markdown/WYSIWYG panel (not necessarily a data visualization, but lives here for lack of a better place)
  • let user name/edit/remove visualizations
  • let user add/remove visualizations from Dashboard

Implement query language

This ticket is more of a brain dump and a roadmap/central discussion point for figuring out which subtickets should exist to get the query language we need.

At a very minimum, a query language should allow unification and disunification with literals; e.g. "where the IP is 12.34.56.78" or "where the IP is not 12.34.56.78". It should support that on arbitrary properties.

A query language should also support (arbitrarily nested) conjunctions and disjunctions of terms, e.g. "where (the IP is p.q.r.s AND we've seen lots of traffic for it) OR (the IP is h.i.j.k and traffic to that IP is unusual)".

Queries must be validated by a schema, so that we don't accidentally expose arbitrary computation.

A useful query language should support logic variables. After all, as a SOC analyst, I don't just want to check for a particular IP or subnet; I want queries like "where the IP is one of these, and that IP is designated as a potential threat" without having to specify that set literally; I just want to specify that we're talking about the same IPs p, q... This doesn't have to be part of the initial implementation; it is a nice-to-have.

For training purposes, it would also be very nice to have the option of "explaining" a query, and potentially even making it bidirectionally writable (this is an idea @smogg has floated and made mockups for), where there is a nice editor with intelligent completions. This is obviously not an MVP feature, and a later nice-to-have that should have its own ticket.

Wherever possible, this code should be written as cljc files, so that a maximum amount can (at least in theory) run on both the browser and the client side.

The suggested implementation for this is core.logic. This has some advantages:

  • Natively supports all of the constraints above
  • Battle-tested (that is what ThreatGRID did); we know that this is something a very similar user group has figured out how to master
  • Allows programs to be run "backwards" i.e. produce samples that would have triggered this rule; useful for debugging and refining tests

Add eastwood to CI

https://github.com/jonase/eastwood

eastwood sometimes produces false positives, so perhaps it should not be gating. Keep in mind that eastwood does active analysis, i.e. it reads your code, so that means it is not okay for code to have import-time side effects.

Add figwheel nrepl middleware

Appropriate middleware might be:

["cider.nrepl/cider-middleware"
 "refactor-nrepl.middleware/wrap-refactor"
 "cemerick.piggieback/wrap-cljs-repl"]

Although it's not clear what happens when some of that's missing (e.g wrap-refactor). If we do this before #71 that will make #71 a bit easier since we won't have to document how to do (use 'figwheel-sidecar.repl-api) (figwheel-sidecar.repl-api/cljs-repl).

Use docker-compose

While desdemona is backed by Onyx, it necessarily has other requirements, including e.g. Kafka for ingestion. This is unlikely to be the last requirement. Once #2 is resolved, it would also be great to have e.g. a [mimic][mimic] running mimicking Rackspace Cloud Files. Finally, with Carina, it would be great if we can set up realistic demo/CI environments on some pretty big iron.

docker-compose gives us docker-quality repeatable environments that are easy to set up, and hopefully still enables production-grade setups.

As a side-effect of resolving this, we should consider having exactly 1 way to run things in development, mimicking the new features in onyx-template where possible. This means that docker-compose up should be the way you set up a development cluster, and setting up a production cluster should look as similar to that as possible.

featurec-equivalent = should be commutative

#29 (implemented by #36) lets you express:

(= (:ip x) "10.0.0.1")

... but:

(= "10.0.0.1" (:ip x))

... should be equivalent. This is not quite as trivial as it may look at first glance, since one might also say:

(= (:ip x) (:ip y))

... but that's a different thing, and should compile down differently: the latter can't just be expressed as a single featurec, but instead is a fresh variable and two featurec relations (one for x, one for y).

Read and write from Rackspace Cloud Files

Onyx currently has support for S3 in (but apparently not out). It would be great if we could have Rackspace Cloud Files support as a durable backend, since that gives us high-durability archiving with very low maintenance cost.

jClouds is how you talk to Rackspace Cloud from the JVM, and they have had plans to support Clojure explicitly in the past, that support has all but languished. I have, in the past, tried to make that better when I was working on rackerlabs/otter at https://github.com/lvh/nordschleife/blob/master/src/nordschleife/auto_scale.clj#L54-L80 which may be interesting to an implementor.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.