Git Product home page Git Product logo

marionette's Issues

Validate Marionette formats before running models

We should do some basic validation of formats once they are loaded by the server/client. This includes verifying the graph structure (e.g., there is a valid path from start to end state) and probability distributions of the transitions (e.g., probabilities sum to 1). We might also consider a validation procedure for actions, as well, so that we do not need to rely on real-time Python errors to discover problems.

Allow client to parameterize marionette format

The marionette client does not accept an input parameter to specify the marionette format. We should add a few tests in marionette.cli_tests that tests marionette under a few different formats.

Explicit loop back to start state from end state

Currently, model execution implicitly returns to the start state of the model after reaching the end state, effectively creating an infinitely running model. It would be a better user experience if the behavior needed to be explicitly described in the model format with a transition from the end state to the start state (with associated action blocks). With this new approach, model execution would halt if it reaches an end state that has no outgoing transitions.

Improve documentation

We have a dearth of documentation, we should definitely refine our README and add a handful of paragraphs that describes our goals and intentions.

Refactor model object cleanup methods

Currently, various objects used by models during their operation are cleaned up in a haphazard way.

For instance, termination of Marionette streams is handled in a number of different places and causes disconnects in various state cleanup procedures. For instance, the Server cleans up its factory for the stream here [1], but the actual stream state gets cleaned up in [2] and [3].

Relatedly, if any of these objects terminate operation abnormally (e.g., no END_OF_STREAM message), then they are never cleaned up and remain in memory for the remainder of the server/client's operation. A concrete example of this are the priority queues in the BufferIncoming class [4].

The code should really be refactored so that we take advantage of destructors to track through all related objects and clean them up appropriately. Another potential approach would be an automatic garbage collection process, which may be necessary for some objects that are referenced by multiple models/channels (like sockets). This may involve the use of inactive timeouts for certain objects that have not been used for long periods of time.

[1] https://github.com/redjack/marionette/blob/master/marionette_tg/__init__.py#L145
[2] https://github.com/redjack/marionette/blob/master/marionette_tg/multiplexer.py#L29
[3] https://github.com/redjack/marionette/blob/master/marionette_tg/multiplexer.py#L90
[4] https://github.com/kpdyer/marionette/blob/master/marionette_tg/multiplexer.py#L175

Strong client authentication and explicit stream separation

Currently, we rely on the use of randomly chosen stream IDs to enforce separation among different clients connecting through the same Marionette server. It would be nice to explicitly enforce this separation with some kind of client ID appended to the stream ID in the respective queues/buffers -- the incoming IP address, for instance.

In the long run, it would be good to have some kind of authentication mechanism built in during the handshake process, such as creating a well-defined, forward-secure session key.

Long-term solution to message-length control

At the moment, we rely on a very particular use of FTE to control message length distributions. Namely, that with certain ".+" language, the length of the output ciphertext will closely approximate that of the input plaintext. Therefore, we can simply divide the input cells into appropriate byte-lengths to match our distributions. For those message lengths that are too small to fit into an FTE message, we can do direct ranking into the language to produce non-information-carrying messages of appropriate size.

In the long run, we'll want a more generic approach that works for a variety of FTE formats. Creating independent FTE objects with distinct slices (i.e., byte lengths) is unlikely to be practical from a performance stand point because of the overhead in initializing the ranking matrix, particularly in the case of large byte lengths. Likely, we'll want to find a good middle ground that offloads some of that responsibility to the template grammars, while FTE is extended in other ways to be slightly more flexible.

travis-ci builds are failing

Right now, travis-ci builds are failing because of 91b4ba2. It looks like we're returning the object from unittest.TextTestRunner.run, but we should be returning a proper return code.

Refactor operation of channels/connections

Currently, every model is associated with one channel regardless of if it handles any direct communications or not. For instance, a hierarchical composition of models will have an active port for even the higher-level model that just spawns/orchestrates lower-level models.

Furthermore, these channels operate such that they automatically open a connection (or start listening) at the start state of the model and it remains open until the model stop operation -- there are many cases where we might want finer-grained control over the start and stop. As an example, in the FTP format, we only want the spawned PASV connections to be listening on the ephemeral port when communicating with the client, while in the amzn_sess format we want to continue listening on the port even if there are zero active HTTP models running.

Closing the listening ports after execution works for FTP but not for the amzn_sess case, while keeping the ports open indefinitely works for amzn_sess but causes exhaustion of file descriptors in the FTP case. We want a more subtle approach that can work well for both of these potential scenarios.

Tasks for this refactor include:

  • Removing the required connection keyword
  • Moving channel-related operations into actions, thereby allowing fine-grained control over when a model listens on a port (server) and connects to its peer (client)
  • Decouple channels from models so that higher-level models do not need to create channels, though they should be allowed to create long-lasting channels that they then share with multiple child models

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.