Git Product home page Git Product logo

docbleach's Introduction

DocBleach is an advanced Content Disarm and Reconstruction software. Its objective is to remove misbehaving dynamic content from your Office files, or everything that could be a threat to the safety of your computer.

Build Status

Let's assume your job involves working with files from external sources, for instance reading resumes from unknown applicants. You receive for example a .doc file, your anti-virus doesn't detect it as harmful, and you decide to open it anyway. You get infected. You can use DocBleach to sanitize this document: chances are you don't get infected, because the dynamic content isn't run.

Howto's

To build DocBleach, use Maven:

$ mvn clean package
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.696 s
[INFO] Finished at: 2016-12-19T17:36:10+01:00
[INFO] Final Memory: 29M/234M
[INFO] ------------------------------------------------------------------------

The final jar is stored in cli/target/docbleach.jar.

To use DocBleach, you may either use the Web Interface or run it in CLI:

java -jar docbleach.jar -in unsafe_document.doc -out safe_doc.doc

The input file may be a relative/absolute path, an URI (think: http:// link), or a dash (-).

The output file may be a relative/absolute path, or a dash (-).

If a dash is given, the input will be taken from stdin, and the output will be sent to stdout.

DocBleach's information (removed threats, errors, ...) are sent to stderr.

Advanced usage

Get the sources

    git clone https://github.com/docbleach/DocBleach.git
    cd DocBleach
    mvn install
    # Import it as a Maven project in your favorite IDE

You've developed a new cool feature ? Fixed an annoying bug ? We'd be happy to hear from you !

Run the tests

The tests run with JUnit 5, which is perfectly integrated in Maven. To run tests, just run mvn test. You should get something similar to this:

[INFO] Scanning for projects...
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Dec 19, 2016 5:33:54 PM org.junit.platform.launcher.core.ServiceLoaderTestEngineRegistry loadTestEngines
INFO: Discovered TestEngines with IDs: [junit-jupiter]
Running org.docbleach.bleach.PdfBleachTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.246 sec - in org.docbleach.bleach.PdfBleachTest
Running org.docbleach.bleach.OLE2BleachTest

Results :

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.252 s
[INFO] Finished at: 2016-12-19T17:33:55+01:00
[INFO] Final Memory: 19M/211M
[INFO] ------------------------------------------------------------------------

BUILD SUCCESS confirms that all the tests were run successfuly.

Related links

Releases

The releases are available as Windows executables that don't depend on Java, thanks to the Excelsior Jet technology.

License

See LICENSE.

Project Status

Don't expect the code base to change everyday, but feel free to contribute: new ideas are more than welcome, and threats evolve - so should we.

Some things would be awesome, though:

  • Adding a way to configure bleaches
  • Write tests!
  • Writing more content to show and explain how the sanitation process works, why it works.
  • Adding more stats!

docbleach's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docbleach's Issues

Upgrade poi-ooxml from 3.17 to 4.0

Hello, we are using DocBleach along with Apache Tika which recently released version 1.19. After upgrading we started receiving the exception java.lang.ClassNotFoundException: org.apache.poi.poifs.filesystem.NPOIFSFileSystem.

Upon further investigation we found that Tika is now using v4.0 of the org.apache.poi:poi-ooxml module. The v4 release of this module has breaking changes from the v3.17 that DocBleach is using as they renamed the NPOIFSFileSystem class to POIFSFileSystem (https://poi.apache.org/changes.html). Please consider upgrading this module so that it is compatible with the latest version of Tika.

Thanks

Other gateways

DocBleach would work better if it was integrated as a mail Gateway, Endpoint gateway (USB filter), Web proxy…
If someone wants to give it a try, it should be quite easy using the BleachSession.sanitize method! :-)

Add support for OpenOffice formats

Hi,

I tried to bleach an open office document (.odt) file containing a macro. DocBleach did not report any threats and declared the document to be safe.

Running docbleach with the -vv switch revealed that it recognised the file to be a zip archive (which is somewhat correct). So this issue is both a feature request (add support for Open Office formats) and a bug report: DocBleach really should not declare a open office file with macros to be safe.

Steps to reproduce:

  1. Create a .odt file with a macro.
  2. Run DocBleach on the file: java -jar docbleach.jar -in file.odt -out file-out.odt -vv

Expected behaviour:
Remove the macro from the file.

Actual behaviour:
The file is declared to be safe.

Best regards,
OOTS

Configuration

DocBleach removes all the potential threats it finds. Sometimes users want to keep macros, because they trust themselves (why not?).

-> DocBleach should allow bleaches to be configured.

Linked issue: #2, configuration should have an effect on the first two layers (specific & format bleach) to enable/disable a specific bleach or to configure it.

PdfBleach: improve magic header detection

On the PDF reference manual, the file header section stipulates that:

Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.

In the same section, we also notice that:

Acrobat viewers also accept a header of the form %!PS−Adobe−N.n PDF−M.m

Since the user can be able to open correctly the file in his PDF viewer, it should be made sure that docbleach will handle these documents.

CLI: handle directories

The CLI should accept "directory names", sanitising multiple files in a directory, outputting them in another directory

Unable to santize new DOCX document

Have tried latest branch of Docbleach, was successful in santizing the document. However, .docx files shows unreadable content and the output for a santized .doc file removed all contents from the file.

Refactoring the code base

I think it would be great to separate the code base into multiple chunks of code: maybe work with layers?
In my head, there are 4 layers:

  • Specific Bleach to a format+target, for instance "remove macro from OOXML office document".
  • Format Bleach, responsible of cleaning all threats of a given format.
  • The Bleach Factory™, responsible of cleaning a given thing no matter what it is (and so, to guess the type)
  • 8th Layer - the user

This way, DocBleach would be a library, the "format bleaches" would be like addons/plugins, and calling docbleach.jar would launch rockets in every directions.

Handle zipped payloads

if submitted doc is zipped (.zip?) maybe you could unzip it, Bleach every document inside it, re zip all the doc in an archive that will be send back to the client

Improve the console output

Linked issue: #2
As of now, the amount of potentiel threats removed is the only displayed information.

Having two output formats would be fine, with a toggle: 👨 (textual) & 🤖 (json).

Nice informations to include:
For the 👨 display, name & location of the threat ("Macro executed when the document is opened", "Active.X object in the 2nd page" ....), list of the removed content
For the 🤖 output: same thing, with internal data?

Error in installing

Please help me to solve this error:

Java version: openjdk version "1.8.0_151"
mvn version: Apache Maven 3.3.9

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] DocBleach Parent ................................... SUCCESS [  0.761 s]
[INFO] DocBleach API ...................................... SUCCESS [  4.380 s]
[INFO] DocBleach Modules .................................. SUCCESS [  0.026 s]
[INFO] Office-Bleach ...................................... FAILURE [  3.570 s]
[INFO] PDF-Bleach ......................................... SKIPPED
[INFO] RTF-Bleach ......................................... SKIPPED
[INFO] Zip-Bleach ......................................... SKIPPED
[INFO] DocBleach CLI ...................................... SKIPPED
[INFO] DocBleach Web Server ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 9.246 s
[INFO] Finished at: 2018-07-07T13:43:36+04:00
[INFO] Final Memory: 22M/86M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project module-office: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed: An API incompatibility was encountered while executing org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test: java.lang.NoSuchMethodError: org.apache.maven.surefire.util.internal.StringUtils.requireNonNull(Ljava/lang/Object;)Ljava/lang/Object;
[ERROR] -----------------------------------------------------
[ERROR] realm =    plugin>org.apache.maven.plugins:maven-surefire-plugin:2.19.1
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/root/.m2/repository/org/apache/maven/plugins/maven-surefire-plugin/2.19.1/maven-surefire-plugin-2.19.1.jar
[ERROR] urls[1] = file:/root/.m2/repository/org/junit/platform/junit-platform-surefire-provider/1.3.0-M1/junit-platform-surefire-provider-1.3.0-M1.jar
[ERROR] urls[2] = file:/root/.m2/repository/org/apiguardian/apiguardian-api/1.0.0/apiguardian-api-1.0.0.jar
[ERROR] urls[3] = file:/root/.m2/repository/org/junit/platform/junit-platform-launcher/1.3.0-M1/junit-platform-launcher-1.3.0-M1.jar
[ERROR] urls[4] = file:/root/.m2/repository/org/apache/maven/surefire/common-java5/2.22.0/common-java5-2.22.0.jar
[ERROR] urls[5] = file:/root/.m2/repository/org/junit/jupiter/junit-jupiter-engine/5.3.0-M1/junit-jupiter-engine-5.3.0-M1.jar
[ERROR] urls[6] = file:/root/.m2/repository/org/junit/platform/junit-platform-engine/1.3.0-M1/junit-platform-engine-1.3.0-M1.jar
[ERROR] urls[7] = file:/root/.m2/repository/org/junit/platform/junit-platform-commons/1.3.0-M1/junit-platform-commons-1.3.0-M1.jar
[ERROR] urls[8] = file:/root/.m2/repository/org/opentest4j/opentest4j/1.1.0/opentest4j-1.1.0.jar
[ERROR] urls[9] = file:/root/.m2/repository/org/junit/jupiter/junit-jupiter-api/5.3.0-M1/junit-jupiter-api-5.3.0-M1.jar
[ERROR] urls[10] = file:/root/.m2/repository/org/apache/maven/surefire/maven-surefire-common/2.19.1/maven-surefire-common-2.19.1.jar
[ERROR] urls[11] = file:/root/.m2/repository/org/apache/maven/surefire/surefire-booter/2.19.1/surefire-booter-2.19.1.jar
[ERROR] urls[12] = file:/root/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar
[ERROR] urls[13] = file:/root/.m2/repository/junit/junit/4.12/junit-4.12.jar
[ERROR] urls[14] = file:/root/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
[ERROR] urls[15] = file:/root/.m2/repository/backport-util-concurrent/backport-util-concurrent/3.1/backport-util-concurrent-3.1.jar
[ERROR] urls[16] = file:/root/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.11/plexus-interpolation-1.11.jar
[ERROR] urls[17] = file:/root/.m2/repository/org/slf4j/slf4j-jdk14/1.5.6/slf4j-jdk14-1.5.6.jar
[ERROR] urls[18] = file:/root/.m2/repository/org/slf4j/jcl-over-slf4j/1.5.6/jcl-over-slf4j-1.5.6.jar
[ERROR] urls[19] = file:/root/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.0/maven-reporting-api-3.0.jar
[ERROR] urls[20] = file:/root/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
[ERROR] urls[21] = file:/root/.m2/repository/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
[ERROR] urls[22] = file:/root/.m2/repository/org/apache/commons/commons-lang3/3.1/commons-lang3-3.1.jar
[ERROR] urls[23] = file:/root/.m2/repository/org/apache/maven/surefire/surefire-api/2.19.1/surefire-api-2.19.1.jar
[ERROR] urls[24] = file:/root/.m2/repository/org/apache/maven/plugin-tools/maven-plugin-annotations/3.3/maven-plugin-annotations-3.3.jar
[ERROR] Number of foreign imports: 1
[ERROR] import: Entry[import  from realm ClassRealm[maven.api, parent: null]]
[ERROR]

DOCM corrupt after docbleach

I run a simple test with a Docm containing a macro:

java -jar docbleach.jar -in Doc1.docm -out out.docm -vv
[main] DEBUG xyz.docbleach.Main - Log Level: TRACE
[main] DEBUG xyz.docbleach.Main - Checking output name : out.docm
[main] DEBUG xyz.docbleach.Main - Checking input name : Doc1.docm
[main] DEBUG xyz.docbleach.BleachSession - First 8 bytes: [80, 75, 3, 4, 20, 0,
6, 0]
[main] DEBUG xyz.docbleach.BleachSession - Found bleach for this file type: Offi
ce Bleach
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - File opened
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /_rels/.rels
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-package.relationships+xml for part /_rels/.rels
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /docProps/app.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.extended-properties+xml for part /docProps/app.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /docProps/core.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-package.core-properties+xml for part /docProps/core.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/_rels/document.
xml.rels
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-package.relationships+xml for part /word/_rels/document.xml.rels
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/_rels/vbaProjec
t.bin.rels
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-package.relationships+xml for part /word/_rels/vbaProject.bin.rels
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/document.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.ms
-word.document.macroEnabled.main+xml for part /word/document.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Found and removed suspicious con
tent type: 'application/vnd.ms-word.document.macroEnabled.main+xml' in '/word/do
cument.xml' (Size: -1)
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/fontTable.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.wordprocessingml.fontTable+xml for part /word/fontTa
ble.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/settings.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.wordprocessingml.settings+xml for part /word/setting
s.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/styles.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.wordprocessingml.styles+xml for part /word/styles.xm
l
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/stylesWithEffec
ts.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.ms
-word.stylesWithEffects+xml for part /word/stylesWithEffects.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/theme/theme1.xm
l
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.theme+xml for part /word/theme/theme1.xml
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/vbaData.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.ms
-word.vbaData+xml for part /word/vbaData.xml
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Found and removed suspicious con
tent type: 'application/vnd.ms-word.vbaData+xml' in '/word/vbaData.xml' (Size: -
1)
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/vbaProject.bin
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.ms
-office.vbaProject for part /word/vbaProject.bin
[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Found and removed suspicious con
tent type: 'application/vnd.ms-office.vbaProject' in '/word/vbaProject.bin' (Siz
e: -1)
[main] TRACE xyz.docbleach.bleach.OOXMLBleach - Part name: /word/webSettings.xml

[main] DEBUG xyz.docbleach.bleach.OOXMLBleach - Content type: application/vnd.op
enxmlformats-officedocument.wordprocessingml.webSettings+xml for part /word/webS
ettings.xml
[main] WARN xyz.docbleach.Main - Sanitized file has been saved, 3 potential thre
at(s) removed.

Word cannot open the resulting out.docm anymore:

untitled

Any ideas what to fix?

Office: Remove DDE relations

https://pwndizzle.blogspot.fr/2017/03/office-document-macros-ole-actions-dde.html

Samples (they both open calc.exe)

DDE.xlsx
DDE.xls.zip

[main] TRACE xyz.docbleach.module.ooxml.OOXMLBleach - Part name: /xl/externalLinks/externalLink1.xml
[main] DEBUG xyz.docbleach.module.ooxml.OOXMLBleach - Content type: application/vnd.openxmlformats-officedocument.spreadsheetml.externalLink+xml for part /xl/externalLinks/externalLink1.xml


[main] DEBUG xyz.docbleach.module.ooxml.OOXMLBleach - Relation type 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink' found from 'Name: /xl/workbook.xml - Content Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml' to '/xl/externalLinks/externalLink1.xml'

Content of externalLink1.xml:

<externalLink xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14" xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main"><ddeLink xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" ddeService="cmd" ddeTopic="/c calc"><ddeItems><ddeItem name="A0" advise="1"/><ddeItem name="StdDocumentName" ole="1" advise="1"/></ddeItems></ddeLink></externalLink>```

I haven't found a clean way to remove it for the moment. Fix might look like what's discussed in #28 

Excel 2003: Sanitised files still have a reference to the Macros

When OLE2 (files up to Office 2003) are sanitised, and Macros are removed, the "Workbook"/"Document" entry still knows about them and generates an error or a Warning.

Help asked on the Apache POI Mailing list, in hope someone has a clue on how to fix it.

This has no impact on the sanitation process, but displays ugly warnings.

Excel warning 1

Excel warning 2

Cannot Build Good CLI

Hi, i'm new to git/maven but have successful build of docbleach. The jar file corrupts every file I feed it. I have been using the last release / windows installer from December 2017 without issues, but wanted the new improvements. Files run through this current branch are 4K (for a docx that started as 2000K). No errors - output says file is good so i'm just copying it. Word says the file is corrupt and cannot be opened. Any ideas? Only issues during compile are 2 warnings:
[WARNING] Some problems were encountered while building the effective model for xyz.docbleach:cli:jar:0.0.1-SNAPSHOT
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 89, column 15
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-deploy-plugin is missing. @ line 133, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.

and this: [WARNING] Discovered module-info.class. Shading will break its strong encapsulation.

but, green build. Running the cli/target jar file.
I have asked a developer/coder to compile and we get the same results. Any help..... Thanks

OOXML: Rewrite content type for macroEnabled documents

Macro Enabled content types need to be rewritten to their "normal" types.
Full list of content types to consider (to be checked using Microsoft's official list):

Extension  MIME Type
.docx    application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dotx     application/vnd.openxmlformats-officedocument.wordprocessingml.template
.docm     application/vnd.ms-word.document.macroEnabled.12
.dotm     application/vnd.ms-word.template.macroEnabled.12

.xlsx     application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xltx     application/vnd.openxmlformats-officedocument.spreadsheetml.template
.xlsm     application/vnd.ms-excel.sheet.macroEnabled.12
.xltm     application/vnd.ms-excel.template.macroEnabled.12
.xlam     application/vnd.ms-excel.addin.macroEnabled.12
.xlsb     application/vnd.ms-excel.sheet.binary.macroEnabled.12

.pptx     application/vnd.openxmlformats-officedocument.presentationml.presentation
.potx     application/vnd.openxmlformats-officedocument.presentationml.template
.ppsx     application/vnd.openxmlformats-officedocument.presentationml.slideshow
.ppam     application/vnd.ms-powerpoint.addin.macroEnabled.12
.pptm     application/vnd.ms-powerpoint.presentation.macroEnabled.12
.potm     application/vnd.ms-powerpoint.template.macroEnabled.12
.ppsm     application/vnd.ms-powerpoint.slideshow.macroEnabled.12

Source


Because of this, Office for Mac may display this warning before opening the files:
Office for Mac - Enable Macros

OLE2: Filter based on Root Class ID

Embedded OLE files are a threat. We know it.
But some Office Addins depend on them.

Sample legitimate classid: 3EAB3858-A0E0-4A3B-A405-F4D525E85265, D52B1FA2-1EF8-4035-9DA6-8AD0F40267A1

Useful links:

    name = "office_vuln_guid"
    description = "GUIDs known to be associated with a CVE were requested (may be False Positive)"
    severity = 3
    categories = ["office"]
    authors = ["Niels Warnars @ Cuckoo Technologies"]
    minimum = "2.0"

    bad_guids = {
        "BDD1F04B-858B-11D1-B16A-00C0F0283628": "CVE-2012-0158",
        "996BF5E0-8044-4650-ADEB-0B013914E99C": "CVE-2012-0158",
        "C74190B6-8589-11d1-B16A-00C0F0283628": "CVE-2012-0158",
        "9181DC5F-E07D-418A-ACA6-8EEA1ECB8E9E": "CVE-2012-0158",
        "1EFB6596-857C-11D1-B16A-00C0F0283628": "CVE-2012-1856",
        "66833FE6-8583-11D1-B16A-00C0F0283628": "CVE-2012-1856",
        "1EFB6596-857C-11D1-B16A-00C0F0283628": "CVE-2013-3906",
        "DD9DA666-8594-11D1-B16A-00C0F0283628": "CVE-2014-1761",
        "00000535-0000-0010-8000-00AA006D2EA4": "CVE-2015-0097",
        "0E59F1D5-1FBE-11D0-8FF2-00A0D10038BC": "CVE-2015-0097",
        "05741520-C4EB-440A-AC3F-9643BBC9F847": "CVE-2015-1641",
        "A08A033D-1A75-4AB6-A166-EAD02F547959": "CVE-2015-1641",
        "F4754C9B-64F5-4B40-8AF4-679732AC0607": "CVE-2015-1641",
        "4C599241-6926-101B-9992-00000B65C6F9": "CVE-2015-2424",
        "44F9A03B-A3EC-4F3B-9364-08E0007F21DF": "CVE-2015-2424",

PdfBleach: Document encryption

Hi, So I noticed that after sanitising the document there's some sort of encryption on it. What is it for ?

docbleach encryption
Secondly,
I am trying to batch encrypt a lot of pdf documents after sanitising it using docbleach and I am not able to encrypt it. Can you tell me whats happening here?

Properly handle non-UTF8 filenames in Zip

	at java.base/java.lang.StringCoding.throwMalformed(StringCoding.java:685)
	at java.base/java.lang.StringCoding.decodeUTF8_0(StringCoding.java:768)
	at java.base/java.lang.StringCoding.newStringUTF8NoRepl(StringCoding.java:965)
	at java.base/java.lang.System$2.newStringUTF8NoRepl(System.java:2197)
	at java.base/java.util.zip.ZipCoder$UTF8.toString(ZipCoder.java:60)
	at java.base/java.util.zip.ZipCoder.toString(ZipCoder.java:87)
	at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:301)
	at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:123)
	at xyz.docbleach.module.zip.ArchiveBleach.sanitize(ArchiveBleach.java:44)
	at xyz.docbleach.api.bleach.CompositeBleach.sanitize(CompositeBleach.java:74)
	at xyz.docbleach.api.BleachSession.sanitize(BleachSession.java:71)
	at xyz.docbleach.cli.Main.sanitize(Main.java:81)
	at xyz.docbleach.cli.Main.main(Main.java:54)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
	... 13 more

Process finished with exit code 1

Sample file: e35d68feda25f401da03883da1e9c437

Archive:  VirusShare_e35d68feda25f401da03883da1e9c437
Zip file size: 1978422 bytes, number of entries: 4
drwx---     3.1 fat        0 bx stor 13-Jun-24 09:44 bulletstorm-trainer18/
-rwxa--     3.1 fat  2030080 bx defN 11-Mar-12 21:16 bulletstorm-trainer18/BS+28Tr-LinGon.exe
-rw-a--     3.1 fat      893 tx defN 13-Jun-24 09:48 bulletstorm-trainer18/+�+���+��+��.txt
-rw-a--     3.1 fat      151 tx defN 13-Mar-29 17:14 bulletstorm-trainer18/+�+���+��+��.url
4 files, 2031124 bytes uncompressed, 1977223 bytes compressed:  2.7%

Improve the JSON output

Ideas:

  • Add a "top threat severity" score
  • Let threats mimic the file structure. The "API" was designed before recursive bleaches were introduced, so they were not taken into account

Word 2003 not sanitized properly

Linked issues: #14, #16

When OLE2 (files up to Office 2003) are sanitised, and Macros are removed, the "Workbook"/"Document" entry still knows about them and generates an error or a Warning.

This has no impact on the sanitation process, but displays ugly warnings.

capture d ecran 2017-05-02 at 3 34 33 am

The VB Project information must be stored in the Document/0Table/1Table stream (I don't know yet)

Improve the tests suite

Unit tests should be improved

To be useful, the bleach would have to be split into smaller chunks (the PDF Bleach being ~500 lines is awful)

Integration/Functional/Whatever tests

Having a corpus of documents would be awesome, using olevba/pdfid/grep/file/zip to detect multiple points. The docs would need to be non sensitive/copyrighted documents.

  • the file looks like it's a valid file (mime type, Office OOXML document contains a rels file, ...)
  • No threats recognised by another tool are present
  • Scan the file using VirusTotal?

I have started a corpus, and will work harder to have one file of each format and for each threat.

ArchiveBleach always says that the orignal / compressed size of each entry is -1

While working on any docx document (produced with Microsoft Office 2010, in my case) and in verbose mode (-vv), ArchiveBleach tells me that the original and compressed size of the documents in the archive is -1, which is very likely to be incorrect.

[main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: Zip Bleach
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: _rels/.rels - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: [Content_Types].xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: bleach/bleach - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: docProps/app.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: docProps/core.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/document.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/_rels/document.xml.rels - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/endnotes.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/fontTable.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/footnotes.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/header1.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/_rels/header1.xml.rels - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/settings.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/styles.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/stylesWithEffects.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/theme/theme1.xml - Size: (original: -1, compressed: -1)
[main] TRACE xyz.docbleach.module.zip.ArchiveBleach - Entry: word/webSettings.xml - Size: (original: -1, compressed: -1)
[main] WARN xyz.docbleach.cli.Main - Sanitized file has been saved, 1 potential threat(s) removed.

Project logo

It would be awesome to have a pretty logo for this project, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.