Git Product home page Git Product logo

marionette's Introduction

marionette

Build Status

This code is still pre-alpha and is NOT suitable for any real-world deployment.

Overview

Marionette is a programmable client-server proxy that enables the user to control network traffic features with a lightweight domain-specific language. The marionette system is described in [2] and builds on ideas from other papers, such as Format-Transforming Encryption [1].

  1. Protocol Misidentification Made Easy with Format-Transforming Encryption url: https://kpdyer.com/publications/ccs2013-fte.pdf
  2. Marionette: A Programmable Network Traffic Obfuscation System url: https://kpdyer.com/publications/usenix2015-marionette.pdf

Installation

Ubuntu

$ sudo apt-get update && sudo apt-get upgrade
$ sudo apt-get install git libgmp-dev python-pip python-dev curl
$ git clone https://github.com/kpdyer/marionette.git
$ cd marionette
$ sudo pip install -r requirements.txt
$ python setup.py build
$ sudo python setup.py install

RedHat/Fedora/CentOS

$ sudo yum update
$ yum install epel-release  # EPEL may be required for some distros
$ sudo yum groupinstall "Development Tools"
$ sudo yum install git gmp-devel python-pip python-devel curl
$ git clone https://github.com/kpdyer/marionette.git
$ cd marionette
$ sudo pip install -r requirements.txt
$ python setup.py build
$ sudo python setup.py install

OSX

Requires homebrew.

$ brew install python gmp curl
$ git clone https://github.com/kpdyer/marionette.git
$ cd marionette
$ python setup.py install

Sanity check

$ python setup.py test
...
----------------------------------------------------------------------
Ran 13 tests in Xs

OK

Running

And then testing with the servers...

$ ./bin/socksserver --local_port 8081 &
$ ./bin/marionette_server --server_ip 127.0.0.1 --proxy_ip 127.0.0.1 --proxy_port 8081 --format dummy&
$ ./bin/marionette_client --server_ip 127.0.0.1 --client_ip 127.0.0.1 --client_port 8079 --format dummy&
$ curl --socks4a 127.0.0.1:8079 example.com

A complete list of options is available with the --help parameter.

marionette.conf

  • general.debug - [boolean] print useful debug information to the console
  • general.autoupdate - [boolean] enable automatic checks for new marionette formats
  • general.update_server - [string] the remote address of the server we should use for marionette updates
  • client.client_ip - [string] the iface we should listen on if it isn't specified on the CLI
  • client.client_port - [int] the port we should listen on if it isn't specified on the CLI
  • server.server_ip - [string] the iface we should listen on if it isn't specified on the CLI
  • server.proxy_ip - [string] the iface we should forward connects to if it isn't specified on the CLI
  • server.proxy_port - [int] the port we should forward connects to if it isn't specified on the CLI

Marionette DSL

Marionette's DSL is

connection([connection_type], [port]):
  start [dst] [block_name] [prob | error]
  [src] [dst] [block_name] [prob | error]
  ...
  [src] end [block_name] [prob | error]

action [block_name]:
  [client | server] [module].[func](arg1, arg2, ...)
  [client | server] [module].[func](arg1, arg2, ...) [if regex_match_incoming(regex)]
...

The only connection_type currently supported is tcp. The port specifies the port that the server listens on and client connects to. The block_name specifies the named action that should be exected when transitioning from src to dst. A single error transition can be specified for each src and will be executed if all other potential transitions from src are impossible.

Action blocks specify actions by either a client or server. For brevity we allow specification of an action, such as fte.send

Marionette Plugins

  • fte.send(regex, msg_len) - sends a string on the channel that's encrypted with fte under regex.
  • fte.send_async(regex, msg_len) - sends a string on the channel that's encrypted with fte under regex, does not block receiver-side when waiting for the incoming message.
  • tg.send(grammar_name) - send a message using template grammar grammar_name
  • io.puts(str) - send string str on the channel.
  • model.sleep(n) - sleep for n seconds.
  • model.spawn(format_name, n) - spawn n instances of model format_name, blocks until completion.

note: by specifying a send or a puts, that implicitly invokes a recv or a gets on the receiver side.

Example Formats

Simple HTTP

The following format generates a TCP connection sends one upstream GET and is followed by a downstream OK.

connection(tcp, 80):
  start  client NULL     1.0
  client server http_get 1.0
  server end    http_ok  1.0

action http_get:
  client fte.send("^GET\ \/([a-zA-Z0-9\.\/]*) HTTP/1\.1\r\n\r\n$", 128)

action http_ok:
  server fte.send("^HTTP/1\.1\ 200 OK\r\nContent-Type:\ ([a-zA-Z0-9]+)\r\n\r\n\C*$", 128)

HTTP with error transitions and conditionals

We use error transitions in the following format to deal with incoming connections that aren't from a marionette client. The conditionals are used to match a regex aginst the incoming request.

connection(tcp, 8080):
  start          upstream       NULL        1.0
  upstream       downstream     http_get    1.0
  upstream       downstream_err NULL        error
  downstream_err end            http_ok_err 1.0
  downstream     end            http_ok     1.0

action http_get:
  client fte.send("^GET\ \/([a-zA-Z0-9\.\/]*) HTTP/1\.1\r\n\r\n$", 128)

action http_ok:
  server fte.send("^HTTP/1\.1\ 200 OK\r\nContent-Type:\ ([a-zA-Z0-9]+)\r\n\r\n\C*$", 128)

action http_ok_err:
  server io.puts("HTTP/1.1 200 OK\r\n\r\nHello, World!") if regex_match_incoming("^GET /(index\.html)? HTTP/1\.(0|1).*")
  server io.puts("HTTP/1.1 404 File Not Found\r\n\r\nFile not found!") if regex_match_incoming("^GET /.* HTTP/1\.(0|1).*")
  server io.puts("HTTP/1.1 400 Bad Request\r\n\r\nBad request!") if regex_match_incoming("^.+")

marionette's People

Contributors

ctanzini avatar kpdyer avatar p1ck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marionette's Issues

Validate Marionette formats before running models

We should do some basic validation of formats once they are loaded by the server/client. This includes verifying the graph structure (e.g., there is a valid path from start to end state) and probability distributions of the transitions (e.g., probabilities sum to 1). We might also consider a validation procedure for actions, as well, so that we do not need to rely on real-time Python errors to discover problems.

Refactor operation of channels/connections

Currently, every model is associated with one channel regardless of if it handles any direct communications or not. For instance, a hierarchical composition of models will have an active port for even the higher-level model that just spawns/orchestrates lower-level models.

Furthermore, these channels operate such that they automatically open a connection (or start listening) at the start state of the model and it remains open until the model stop operation -- there are many cases where we might want finer-grained control over the start and stop. As an example, in the FTP format, we only want the spawned PASV connections to be listening on the ephemeral port when communicating with the client, while in the amzn_sess format we want to continue listening on the port even if there are zero active HTTP models running.

Closing the listening ports after execution works for FTP but not for the amzn_sess case, while keeping the ports open indefinitely works for amzn_sess but causes exhaustion of file descriptors in the FTP case. We want a more subtle approach that can work well for both of these potential scenarios.

Tasks for this refactor include:

  • Removing the required connection keyword
  • Moving channel-related operations into actions, thereby allowing fine-grained control over when a model listens on a port (server) and connects to its peer (client)
  • Decouple channels from models so that higher-level models do not need to create channels, though they should be allowed to create long-lasting channels that they then share with multiple child models

Strong client authentication and explicit stream separation

Currently, we rely on the use of randomly chosen stream IDs to enforce separation among different clients connecting through the same Marionette server. It would be nice to explicitly enforce this separation with some kind of client ID appended to the stream ID in the respective queues/buffers -- the incoming IP address, for instance.

In the long run, it would be good to have some kind of authentication mechanism built in during the handshake process, such as creating a well-defined, forward-secure session key.

Refactor model object cleanup methods

Currently, various objects used by models during their operation are cleaned up in a haphazard way.

For instance, termination of Marionette streams is handled in a number of different places and causes disconnects in various state cleanup procedures. For instance, the Server cleans up its factory for the stream here [1], but the actual stream state gets cleaned up in [2] and [3].

Relatedly, if any of these objects terminate operation abnormally (e.g., no END_OF_STREAM message), then they are never cleaned up and remain in memory for the remainder of the server/client's operation. A concrete example of this are the priority queues in the BufferIncoming class [4].

The code should really be refactored so that we take advantage of destructors to track through all related objects and clean them up appropriately. Another potential approach would be an automatic garbage collection process, which may be necessary for some objects that are referenced by multiple models/channels (like sockets). This may involve the use of inactive timeouts for certain objects that have not been used for long periods of time.

[1] https://github.com/redjack/marionette/blob/master/marionette_tg/__init__.py#L145
[2] https://github.com/redjack/marionette/blob/master/marionette_tg/multiplexer.py#L29
[3] https://github.com/redjack/marionette/blob/master/marionette_tg/multiplexer.py#L90
[4] https://github.com/kpdyer/marionette/blob/master/marionette_tg/multiplexer.py#L175

travis-ci builds are failing

Right now, travis-ci builds are failing because of 91b4ba2. It looks like we're returning the object from unittest.TextTestRunner.run, but we should be returning a proper return code.

Improve documentation

We have a dearth of documentation, we should definitely refine our README and add a handful of paragraphs that describes our goals and intentions.

Long-term solution to message-length control

At the moment, we rely on a very particular use of FTE to control message length distributions. Namely, that with certain ".+" language, the length of the output ciphertext will closely approximate that of the input plaintext. Therefore, we can simply divide the input cells into appropriate byte-lengths to match our distributions. For those message lengths that are too small to fit into an FTE message, we can do direct ranking into the language to produce non-information-carrying messages of appropriate size.

In the long run, we'll want a more generic approach that works for a variety of FTE formats. Creating independent FTE objects with distinct slices (i.e., byte lengths) is unlikely to be practical from a performance stand point because of the overhead in initializing the ranking matrix, particularly in the case of large byte lengths. Likely, we'll want to find a good middle ground that offloads some of that responsibility to the template grammars, while FTE is extended in other ways to be slightly more flexible.

Allow client to parameterize marionette format

The marionette client does not accept an input parameter to specify the marionette format. We should add a few tests in marionette.cli_tests that tests marionette under a few different formats.

Explicit loop back to start state from end state

Currently, model execution implicitly returns to the start state of the model after reaching the end state, effectively creating an infinitely running model. It would be a better user experience if the behavior needed to be explicitly described in the model format with a transition from the end state to the start state (with associated action blocks). With this new approach, model execution would halt if it reaches an end state that has no outgoing transitions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.