google / turbinia Goto Github PK

Automation and Scaling of Digital Forensics Tools

License: Apache License 2.0

Python 90.10% Shell 3.13% JavaScript 0.84% Dockerfile 1.36% sed 0.12% Batchfile 0.91% YARA 0.15% Go 1.09% HTML 0.02% Vue 2.28%

forensics dfir security security-automation cloud

turbinia's Introduction

Turbinia

Summary

Turbinia is an open-source framework for deploying, managing, and running distributed forensic workloads. It is intended to automate running of common forensic processing tools (i.e. Plaso, TSK, strings, etc) to help with processing evidence in the Cloud, scaling the processing of large amounts of evidence, and decreasing response time by parallelizing processing where possible.

How it works

Turbinia is composed of different components for the client, server and the workers. These components can be run in the Cloud, on local machines, or as a hybrid of both. The Turbinia client makes requests to process evidence to the Turbinia server. The Turbinia server creates logical jobs from these incoming user requests, which creates and schedules forensic processing tasks to be run by the workers. The evidence to be processed will be split up by the jobs when possible, and many tasks can be created in order to process the evidence in parallel. One or more workers run continuously to process tasks from the server. Any new evidence created or discovered by the tasks will be fed back into Turbinia for further processing.

Communication from the client to the server is currently done with Kombu messaging. The worker implementation uses Celery for task scheduling.

The main documentation for Turbinia can be found here. You can also find out more about the architecture and how it works here.

Status

Turbinia is currently in Alpha release.

Installation

There is an installation guide here.

Usage

The basic steps to get things running after the initial installation and configuration are:

Start Turbinia server component with turbiniactl server command
Start Turbinia API server component with turbiniactl api_server command if using Celery
Start one or more Turbinia workers with turbiniactl celeryworker
Install turbinia-client via pip install turbinia-client
Send evidence to be processed from the turbinia client with turbinia-client submit ${evidencetype}
Check status of running tasks with turbinia-client status

turbinia-client can be used to interact with Turbinia through the API server component, and here is the basic usage:

$ turbinia-client -h
Usage: turbinia-client [OPTIONS] COMMAND [ARGS]...

  Turbinia API command-line tool (turbinia-client).

                          ***    ***
                           *          *
                      ***             ******
                     *                      *
                     **      *   *  **     ,*
                       *******  * ********
                              *  * *
                              *  * *
                              %%%%%%
                              %%%%%%
                     %%%%%%%%%%%%%%%       %%%%%%
               %%%%%%%%%%%%%%%%%%%%%      %%%%%%%
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  ** *******
  %%                                                   %%  ***************
  %%                                (%%%%%%%%%%%%%%%%%%%  *****  **
    %%%%%        %%%%%%%%%%%%%%%
    %%%%%%%%%%                     %%          **             ***
       %%%                         %%  %%             %%%           %%%%,
       %%%      %%%   %%%   %%%%%  %%%   %%%   %%  %%%   %%%  %%%       (%%
       %%%      %%%   %%%  %%%     %%     %%/  %%  %%%   %%%  %%%  %%%%%%%%
       %%%      %%%   %%%  %%%     %%%   %%%   %%  %%%   %%%  %%% %%%   %%%
       %%%        %%%%%    %%%       %%%%%     %%  %%%    %%  %%%   %%%%%

  This command-line tool interacts with Turbinia's API server.

  You can specify the API server location in ~/.turbinia_api_config.json

Options:
  -c, --config_instance TEXT  A Turbinia instance configuration name.
                              [default: (dynamic)]
  -p, --config_path TEXT      Path to the .turbinia_api_config.json file..
                              [default: (dynamic)]
  -h, --help                  Show this message and exit.

Commands:
  config    Get Turbinia configuration.
  evidence  Get or upload Turbinia evidence.
  jobs      Get a list of enabled Turbinia jobs.
  result    Get Turbinia request or task results.
  status    Get Turbinia request or task status.
  submit    Submit new requests to the Turbinia API server.

Check out the turbinia-client documentation page for a detailed user guide.

You can also interact with Turbinia directly from Python by using the API library. We provide some examples here

turbinia's People

Contributors

Stargazers

Watchers

Forkers

pombredanne aarontp yoshipaulbrophy glasscodebender onager maxdew kant beamcodeup xhso alexxnica kryndex prats84 r3p3r mextier tobey123 ericzinnikas weeshlow jeis2497052 rbdebeer-zz rgayon deduplicate edwinlu taishi8117 elebelesam kiddinn nmhai fpiedrah-bu cclauss shubhodeep9 gwengle kartikeyap marcelopalacioabreu marciopocebon obaidr76 rajivraj mwatkins-fb gdcollect idkwim yazici sasqwatch prajwal041 toryc cristicmf 355380o726602 adventurepatel kshithijiyer seabreg joachimmetz triplekill 543 chemberger meeehow jsteeleir hed-fb wajihyassine johngalvin c-h-fahy adampielak tmplr11 ro9ueadmin alimez sa3eed3ed jorlamd dmdicki napsolutions chubbymaggie robertdigital blue-infosec arjuneng05 manalu tonycrespome masterscott giovannt0 hacktobeer ipiedrahiv doctype-dev n3x77 ithedzeko puneetmadaan 02bx neotim berggren isabella232 harshcasper piyanut35 ash515 cybersecurity-team ufo2011 hijadelufo hartl3y94 global-localhost global19 global19-atlassian-net jaegeral okwero2014 1crazymoney dajiaji defcon1983 franklinharry nachtwaffen

turbinia's Issues

Connect PubSub code for incoming evidence processing

Fix/Update init scripts

Right now I have PR for some super basic/hacky start-up scripts, but we should update those into something more legit.

Reconsider num of workers per host

Right now PSQ starts a worker per core, but some of the underlying tools (i.e. plaso) do this as well, so it doesn't make sense to have that many worker tasks running. I'm not sure if a single worker per host is the right option either though.

turbiniactl status command

We need a way to get the status out of Turbinia remotely after things start. This may mean creating a new pubsub interface or other back-channel mechanism since right now the way to talk to Turbinia remotely is through pubsub (currently one-way).

Fix multiple logger handlers

Getting multiple log lines right now. Probably need to so something like this in logger.py:
https://stackoverflow.com/questions/6333916/python-logging-ensure-a-handler-is-added-only-once

Create new GCS -> Persistent Disk copy task

This will create a new Persistent disk with a filesystem that is slightly larger than the image file on GCS, and will copy the raw image from GCS directly as a file in the Persistent Disk FS. This will require a new Evidence type called something like PersistentDiskLocalImage.

Add check in cloud preprocessor for running on non-cloud

Right now the workers assume that we can process anything so we should make sure to fail gracefully when we can't.

Adding errors to TurbiniaTaskResult causes it to be unpickleable

[INFO] Took 0.27 sec
Traceback (most recent call last):
File "/home/turbinia/src/turbinia/turbiniactl", line 175, in
worker.listen()
File "/home/turbinia/turbinia-env/local/lib/python2.7/site-packages/psq/worker.py", line 60, in listen
self.run_task(task)
File "/home/turbinia/turbinia-env/local/lib/python2.7/site-packages/psq/worker.py", line 69, in run_task
task.execute(self.queue)
File "/home/turbinia/turbinia-env/local/lib/python2.7/site-packages/psq/task.py", line 96, in execute
queue.storage.put_task(self)
File "/home/turbinia/turbinia-env/local/lib/python2.7/site-packages/psq/datastore_storage.py", line 61, in put_task
entity['data'] = dumps(task)
File "/home/turbinia/turbinia-env/local/lib/python2.7/site-packages/google/cloud/client.py", line 134, in getstate
'Clients have non-trivial state that is local and unpickleable.',
pickle.PicklingError: Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.

Add Turbinia instance as datastore namespace

Right now when we save data into Datastore we don't have any way to distinguish which Turbinia instance is saving the entities, which means that we can't have more than one Turbinia instance per cloud project without mixing the task status results together. We should add a new 'instance' (or something similar) as part of the entity key to help disambiguate. We could potentially use the config.PSQ_TOPIC variable here as this should already be unique.

Add requirements.txt

Add a README That Explains this Project

Hey @berggren (& other Googlers!),

I've heard this project referenced multiple times, and I'm super curious, but the README doesn't really give much of an idea what this project is for. Is it like a Salt/Puppet/Ansible thing? Is it an orchestration platform? I know I could read the code, but that's a big investment. A decent README that explains what this project is about would go a long way to helping folks evaluate if it's helpful.

Thanks!

Format with YAPF

Also consider YAPF presubmit hooks.

Fix up docker config

Attach cloud disks

... This potentially means implementing pre-processors as well.

Process Evidence directly from GCS

This will probably use GCS FUSE:
https://cloud.google.com/storage/docs/gcs-fuse

Update setup.py

It's out of date, and has packages listed from the previous iteration.

Organize and fix up setup/install instructions

Autogenerate turbiniactl command args from evidence objects

We should autogenerate all of the commands and command args for turbiniactl based on the evidence object types and attributes so that all evidence types can be added through tubiniactl without needing to manually keep things in sync.

Add psort output to plaso plugin

Output in CSV format. Also Create new evidence type for PlasoCSVFile.

Add locking mechanism to Turbinia Server

This is so that multiple Turbinia Servers don't stomp on each other when using the same PubSub queue for scheduling tasks, etc.

Add support for multi-pools for workers

We can add different worker pools that are created to match individual job types (e.g. so Plaso can have worker nodes that have more cpu than workers for other job types). Right now it's one global worker pool.

Add service account key path to config, and automatically source it

Add email notifications

... With links to completed processing results (both successful runs and failures).

Handle non-fatal task failures by re-adding task to queue

... Also set a TTL to the task.

Add more Job/Task types

Strings
fls
keyword filters
etc

Automatically create pubsub topic/subscription

Right now they are manually created, but the code should check for them and automatically create them if they do not exist.

Add disk encryption {pre,post}processors

Add privilege aware execution handler

Currently Turbinia assumes that it runs as a user with sudo privileges. We should add methods to the TurbiniaTask object to handle executing privileged commands rather than hard-coding sudo into commands being run.

turbiniactl command to retrieve result data

ie. get .plaso files from GCS or wherever the new Evidence objects are written to.

Timesketch support

Timesketch has a fancy new python API. We should upload output to timesketch where relevant.

Implement turbiniamgmt workercheck command

Right now we don't have a way to check how many workers are connected, and this can be non-trivial because workers can connect from anywhere. We should implement a 'turbiniactl workercheck' (or similar) command to run a quick check on all the workers. PSQ has a Broadcast worker mechanism that we can use for this.

Propagate hard task failures and exception data back to task manager

WebUI

We want a simple webui to start turbinia jobs.

Create new RawDiskPartition Evidence type

... And also a Task that takes a RawDisk and generates RawDiskPartitions.

Add turbiniactl support for external actions

Easy handling of multiple Turbinia instances from the client

Right now the user has to manually manage separate configs to talk to different Turbinia instances. It would be nice if there was a way to have multiple servers registered under one config, or possibly have a way for servers to register in a cloud project, and have the client be able to easily select from any server instance in a given cloud project.