Git Product home page Git Product logo

assemblyline-service-extract's Introduction

Extract Service

This Assemblyline service extracts embedded files from file containers (like ZIP, RAR, 7z ...)

NOTE: This service does not require you to buy any licence

Execution

The service uses the 7zip library to extract files out of containers then resubmits them for analysis.

It will also:

  • Use the python tnefparse library to parse tnef files;
  • Use the xxxswf library to extract compressed swf files;
  • Use unace to extract winace compressed files;
  • Use mstools and custom script to attempt to decode MSOffice files;
  • Extract attachments from .eml files;
  • Attempt automatic decoding using:
    • A default list of passwords (see section below)
    • An optional user-supplied password (see section below)
    • The body of an .eml file (separated once by whitespace characters and second on [a-zA-Z0-9]+)
  • Use pdfdetach in poppler-utils to extract attachments from pdf samples;
  • Use the NSIS Reversing Suite to recover a preview of the the original Setup.nsi
  • Debloat bloated files:
    • Windows executables: debloat and custom scripts
    • Windows installers (.msi)
    • Every other files by using a generic entropy-based calculator
  • Integrates the capabilities of the now-archived AutoItRipper service

Once this service has completed its processing, it will block samples from continuing to other services unless they are identified as the following file types:

- Executables
- Java files
- Android/APK packages
- Document files (i.e. Microsoft Office and PDF)
- Apple/IPA packages

NOTE: This service will avoid adding unnecessary files if the files are known to the system to be safe. This can be overridden by running the service task with deep_scan enabled.

Submission Parameters & Configuration

Parameters:

  • Password: An additional password can be provided to the service on submission to decode a container.
  • Extract PE Sections: Using the 7zip library, the service will extract sections from an executable file.
  • Continue After Extract: When true, AL will continue processing an eml sample to other services after any attachments have been extracted.

Config (set by administrator):

  • default_pw_list: List of passwords used when attempting to extract from protected archives.
  • max_email_attachment_size: Maximum size attachment to extract from a .eml file.
  • named_email_attachments_only: When true, the service will only extract attachment files from .eml when the file name is provided.

assemblyline-service-extract's People

Contributors

cccs-aa avatar cccs-chrisb avatar cccs-douglass avatar cccs-jh avatar cccs-kevin avatar cccs-ma avatar cccs-ml avatar cccs-rs avatar cccs-rushi avatar cccs-sgaron avatar delirious-lettuce avatar deloittem avatar ekkerri-cse avatar gdesmar avatar glimps-jbo avatar hawken93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

assemblyline-service-extract's Issues

Clearer error message when a passworded corrupted file cannot be extracted

Follow-up of issue 33 from EmlParser.
The message parsed by EmlParser is corrupted, but EmlParser is able to parse the body correctly, add the attachment to the submission and the password to the list of potential passwords.

The current error returned by Extract is File 'PACK-715525.xlsm' is encrypted, password required for extraction. It would be valuable if we could better determine when the error is caused by a bad password or a corrupted file, and return a clearer error to the UI.

Bug: Infinite loop in 7zip extraction

Hi,

I believe I have found an infinite loop bug in the extract service. The bug was suspected after 39 minutes of 100% cpu usage, even though the time limit should have been 60 seconds. The version is 4.2.0.stable26, but I haven't seen any changes to this mechanic in master. This is running in kubernetes via the assemblyline helm chart (v4.2) :)

Stack trace shows for extract.py:

  • 260 (extract)
  • 726 (extract_zip)
  • 802 (extract_zip_7zip)
  • 434 (_7zip_submit_extracted)
  • os.walk

strace:

  • openat /tmp/working_directory/tmp.../extracted_zip = -1 ENOENT (No such file or directory) * repeated indefinitely

Reasons:

  • while changes_made -> no os.walk -> no clearing of changes_made flag
  • The folder seems to be made from the unzip operation, but the unzip operation could fail, apparently without making a folder.

Possible solutions:

  • Check if the unzip operation was successful, via return code?
  • The loop on line 433 could be structured so as to clear the flag before the os.walk, so that unless os.walk sets it then the loop is exited.
changes_made = True
while changes_made:
  changes_made = False
  for ... in os.walk(path):
    ...
    if this or that:
      changes_made = True

Cheers :)

Malicious hta file not scanned after running through extract service

We recently ran into a malicious hta file in our deployment (credential fishing attempt) that did not get detected by Yara as the extract service dropped the submission and did not get further dispatched by the dispatcher.

Root of the bug is here, it should also check against 'code/hta':

https://github.com/CybercentreCanada/assemblyline-service-extract/blob/master/extract/extract.py#L280

if (
    not few_small_files_only
    and not request.file_type.startswith("executable")
    and not request.file_type.startswith("java")
    and not request.file_type.startswith("android")
    and not request.file_type.startswith("document")
    and request.file_type != "ios/ipa"
    and request.file_type != "code/html"
    and request.file_type != "archive/iso"
    and request.file_type != "archive/udf"
    and request.file_type != "archive/vhd"
    and not request.get_param("continue_after_extract")
):
  request.drop()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.