simsong / bulk_extractor Goto Github PK

This is the development tree. Production downloads are at:

Home Page: https://github.com/simsong/bulk_extractor/releases

License: Other

Shell 14.59% Makefile 0.62% HTML 4.58% Java 0.29% C++ 53.49% C 5.60% Python 12.92% NSIS 0.89% Lex 2.45% M4 3.24% Rich Text Format 1.32%

bulk_extractor's Introduction

bulk_extractor is a high-performance digital forensics exploitation tool. It is a "get evidence" button that rapidly scans any kind of input (disk images, files, directories of files, etc) and extracts structured information such as email addresses, credit card numbers, JPEGs and JSON snippets without parsing the file system or file system structures. The results are stored in text files that are easily inspected, searched, or used as inputs for other forensic processing. bulk_extractor also creates histograms of certain kinds of features that it finds, such as Google search terms and email addresses, as previous research has shown that such histograms are especially useful in investigative and law enforcement applications.

Unlike other digital forensics tools, bulk_extractor probes every byte of data to see if it is the start of a sequence that can be decompressed or otherwise decoded. If so, the decoded data are recursively re-examined. As a result, bulk_extractor can find things like BASE64-encoded JPEGs and compressed JSON objects that traditional carving tools miss.

This is the bulk_extractor 2.1 development branch! It is reliable, but if you want to have a well-tested production quality release, download a release from https://github.com/simsong/bulk_extractor/releases.

Building `bulk_extractor`

We recommend building from sources. We provide a number of bash scripts in the etc/ directory that will configure a clean virtual machine:

git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git
./bootstrap.sh
./configure
make
make install

For detailed instructions on installing packages and building bulk_extractor, read the wiki page here: https://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractor

For more information on bulk_extractor, visit: https://forensics.wiki/bulk_extractor

Tested Configurations

This release of bulk_extractor requires C++17 and has been tested to compile on the following platforms:

Amazon Linux as of 2023-05-25
Fedora 36 (most recently)
Ubuntu 20.04LTS
MacOS 13.2.1

You should always start with a fresh VM and prepare the system using the appropriate prep script in the etc/ directory.

Tested Configurations Which bulk_extractor Does Not Work

Debian 10 (is not supported for native builds))

RECOMMENDED CITATION

If you are writing a scientific paper and using bulk_extractor, please cite it with:

Garfinkel, Simson, Digital media triage with bulk data analysis and bulk_extractor. Computers and Security 32: 56-72 (2013)

@article{10.5555/2748150.2748581,
author = {Garfinkel, Simson L.},
title = {Digital Media Triage with Bulk Data Analysis and Bulk_extractor},
year = {2013},
issue_date = {February 2013},
publisher = {Elsevier Advanced Technology Publications},
address = {GBR},
volume = {32},
number = {C},
issn = {0167-4048},
journal = {Comput. Secur.},
month = feb,
pages = {56–72},
numpages = {17},
keywords = {Digital forensics, Bulk data analysis, bulk_extractor, Stream-based forensics, Windows hibernation files, Parallelized forensic analysis, Optimistic decompression, Forensic path, Margin, EnCase}
}

ENVIRONMENT VARIABLES

The following environment variables can be set to change the operation of bulk_extractor:

Variable	Behavior
`DEBUG_BENCHMARK_CPU`	Include CPU benchmark information in `report.xml` file
`DEBUG_NO_SCANNER_BYPASS`	Disables scanner bypass logic that bypasses some scanners if an sbuf contains ngrams or does not have a high distinct character count.
`DEBUG_HISTOGRAMS`	Print debugging information on file-based histograms.
`DEBUG_HISTOGRAMS_NO_INCREMENTAL`	Do not use incremental, memory-based histograms.
`DEBUG_PRINT_STEPS`	Prints to stdout when each scanner is called for each sbuf
`DEBUG_DUMP_DATA`	Hex-dump each sbuf that is to be scanned.
`DEBUG_SCANNERS_IGNORE`	A comma-separated list of scanners to ignore (not load). Useful for debugging unit tests.

Other hints for debugging:

Run -xall to run without any scanners.
Run with a random sampling of 0.001% to debug reading image size and a few quick seeks.

BUILDING ON WINDOWS

Note: Currenlty bulk_extractor 2.1 does not build on windows, but 2.0 does.

If you wish to build for Windows, you should cross-compile from a Fedora system. Start with a clean VM and use these commands:

$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git
$ cd bulk_extractor/etc
$ bash CONFIGURE_FEDORA36_win64.bash
$ cd ..
$ make win64

BULK_EXTRACTOR 2.0 STATUS REPORT

bulk_extractor 2.0 (BE2) is now operational. Although it works with the Java-based viewer, we do not currently have an installer that runs under Windows.

BE2 requires C++17 to compile. It requires https://github.com/simsong/be13_api.git as a sub-module, which in turn requires dfxml as a sub-module.

The project took longer than anticipated. In addition to updating to C++17, It was used as an opportunity for massive code refactoring and general increase in code quality, testability and reliability. An article about the experiment will appear in a forthcoming issue of ACM Queue

bulk_extractor's People

Contributors

Stargazers

Watchers

Forkers

mikehom sunithamisra ajnelson kefir- endeav0r stumpyuk1 brucemty ekristen jonstewart helias0509 dkarpo narayana1208 andy737 ralfg1 mattdri-ir champ1 relvinhas gklansbu uckelman schism ashamsu chubbymaggie posophe dulani enjoyhacking andreacosta hapser bluebear171 jinhaoxia phulc omnifocal prosciana fork42541 wflk burdzwastaken detrojones lucabongiorni uckelman-sf flakfizer noobfromvn alchemycyberblaze sf-jonstewart estuart redteamcaliber dfirgeek olivierh59500 labgeek utsa-cyber mfalconi thedeserter enascimento norsig cescon179 suhyeonjin sts0mrg0 xinjijia h4ckl4bm3 naveenselvan lorz ir4n6 moddingg33k ro9ueadmin jilir tree0flife weeshlow jfire5401 edsu cameronnielsen sweettimo dbuentello n4rr34n6 luojiacs woolverino scr3w-2ooth 0xfatty lukecoughlan madisettisunil skolldz tw4l seabreg chasewyrick klouie13 dlat solodky 0b0ltus 4n6ist justforkin fake4d zerox1b erdoukki blue-infosec 5l1v3r1 bodzihackerone joachimmetz wedataintelligence netredo hacker4help global19 global-localhost global19-atlassian-net

bulk_extractor's Issues

API to analyze a block with a feature recorder call-back

Need a simple shared library API and demo program that shows bulk_extractor analyzing a block of data and performing callbacks to record features that are found.

Scanning in recursive mode drops features and files

When running the bulk_extractor in recursive directory scan mode (-R), bulk_extractor drops features and files:

If a feature is encountered but a feature has already been recorded at that Forensic Path from another file, then the feature is dropped.
If a filename is not simple ASCII, bulk_extractor will skip the file and not scan it.

This behavior limits completeness of scans using recursive mode.

Whitelist stats go to stdout but not to report.xml

Whitelist stats are reported to stdout but not to report.xml.

Specifically:

When bulk_extractor initializes in main.cpp, it reads any alert list(s) and stop list(s) using function word_and_context_list::readfile in file word_and_context_list.cpp. Unfortunately, bulk_extractor does this before opening report.xml as pointer variable dfxml_writer *xreport, so it is not yet ready to write to report.xml.

To fix this:

Move instantiation of xreport way up near the top,
being careful not to disrupt behavior in the event of an error or if bulk_extractor is being restarted.
Pass the xreport pointer as a new parameter to word_and_context_list::readfile() so that readfile can write the stats directly into report.xml.
I recommend the same treatment of passing xreport to any function
that prints to stdout wherever the user also wants the output to go into report.xml.

python bulkextractor_reader should have an iterator for reports

Currently the iterator only works with report directories and zip files of report directories. It should be modified so that it can handle top-level directories or zip files with multiple reports and return an interator for all of the reports, and each report an iterator for all of the enclosed feature files.

JVM required by BEViewer

A discussion on bulk_extractor-users group resolved in that it is good to compile BEViewer on the latest compiler. The next BEViewer will be compiled on OpenJDK and will require Java 7 JRE.

YY_FATAL_ERROR macro called in scan_email.cpp or scan_accts.cpp.

When running bulk_extractor I am getting the following error message:

input buffer overflow, can't enlarge buffer because scanner uses REJECT
Segmentation fault

Does anyone know what is causing this error message? I am running this under Linux Mint on a 8 core machine with 12 GB of RAM.

bulk_extractor reader should provide a method to split fields

If there aren't enough fields, it should log an error with a line number but keep going.

Don't hardcode #!/usr/bin/python but use #!/usr/bin/env python

I noticed that some files use the #!/usr/bin/env python approach while some are stuck at the hard-coded paths.

The files I found are:
bulk_diff.py, bulk_extractor_reader.py, identify_filenames.py, post_process_exif.py.

Thanks!

scan_kml needs to clean output

Currently scan_kml doesn't clean the tags at the end of a KML scan properly.

scanners should have banner_stamp

The scanner API should allow a scanner to add an annotation to the banner list.

bulk_extractor hangs on FIFO pipes

When providing bulk_extractor (v 1.4.2) with a directory to processes (either a single directory or recursively), bulk_extractor seems to hang indefinitely when it encounters a FIFO pipe (see http://en.wikipedia.org/wiki/Named_pipe).

I've added a patch to correct this behaviour here: https://gist.github.com/ajengle/7998683

In this patch, we simply check the files as you stat them to determine if they are a FIFO pipe. If they are, we skip.

should scan_winpe should store PE headers in files?

Since there are other tools for analyzing PE headers, it may make more sense to store them in individual files than to break them out into XML

Missing 2 email addresses in nps-2010-emails

When processing nps-2010-emails, bulk_extractor misses two email addresses that are in PDF files (and were generated by Microsoft Word). Perhaps this is because the PDF text extractor is now missing them. It should be fixed. See http://digitalcorpora.org/archives/173/comment-page-1#comment-124731

TZ Typo?

Looking at line 48 of /src / scan_email_lg.cpp, it looks like the ABBREV constant has a value of 'UT' instead of 'UTC'.

Was this a typo or a deliberate choice?

Whitelist system may not work properly with exif XML output

From the mailing list:

I ran bulk_extractor against an image and then re-ran it against the
same image again giving it -w exif.txt from the first run.  This
should have resulted in all exif features being stopped, but I get a
non-empty exif file on the second run:

This is the entire exif feature file from the second run.

# UTF-8 Byte Order Marker; see http://unicode.org/faq/utf_bom.html
# BULK_EXTRACTOR-Version: 1.3.1 ($Rev: 10844 $)
# Feature-Recorder: exif
# Filename: win7.vmdk
# Feature-File-Version: 1.1
292220928   288a8ed63c00c1b39343dbe82a090cd0    <exif><ifd0.tiff.Software>Adobe ImageReady</ifd0.tiff.Software></exif>
4899467264  6d5f317239f1b039bc534660ac2abae4    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4900933632  5aea5473d3bd76a86cf4dbe46385545f    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4890324992  4146c4da38363f4e2862d10c1f84f80d    <exif><ifd0.tiff.Copyright>Will Austin</ifd0.tiff.Copyright></exif>
4895125504  92fc7a14c551dae96c1960074865aa59    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4896358400  72ee2842f3d7872a92964734322cac2b    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4897701888  9a48c674f92171fc20eb1f8a5b8c2e9b    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4918804480  72ee2842f3d7872a92964734322cac2b    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4920516608  5b57a8c6cd9393c567f89f0f4cc89522    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4922580992  4faf65eb81de15c1a371f53e5a3a38e0    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>
4924956672  698fcb66721525f86140188781bdb33e    <exif><ifd0.tiff.Copyright>Microsoft Corporation</ifd0.tiff.Copyright></exif>

(there is a tab after the offset on the first line, I don't know why
the mail client doesn't show it).

All of these features are in the exif.txt feature file from the first
run that I used as a stoplist.

Ubuntu 10.04.4 LTS

Prefer octal or hex escape codes

A discussion in bulk_extractor-users group of octal vs. hex escape codes resolved that hex is preferred. Functionally, it doesn't matter, but people visually prefer hex.

Create examples for python module

pcap stoplist

bulk_extractor scan_pcap should support a stoplist of packet artifacts.

URL parse error when surrounded by '"'

Results incorrectly include trailing '&quot' when parsing URLs.

url.txt output:

199452984   http://www.icra.org/ratingsv02.html&quot;   (pics-1.1 &quot;http://www.icra.org/ratingsv02.html&quot; l gen true for 
199453047   http://www.msn.com&quot  true for &quot;http://www.msn.com&quot; r (cz 1 lz 1 n
199453120   http://msn.com&quot  true for &quot;http://msn.com&quot; r (cz 1 lz 1 n
199453189   http://stb.msn.com&quot  true for &quot;http://stb.msn.com&quot; r (cz 1 lz 1 n
199453396   http://www.rsac.org/ratingsv01.html&quot;   z 1 vz 1) &quot;http://www.rsac.org/ratingsv01.html&quot; l gen true for 
199453645   http://stc.msn.com&quot  true for &quot;http://stc.msn.com&quot; r (n 0 s 0 v 0
199453709   http://stj.msn.com&quot  true for &quot;http://stj.msn.com&quot; r (n 0 s 0 v 0

should be:

199452984   http://www.icra.org/ratingsv02.html (pics-1.1 &quot;http://www.icra.org/ratingsv02.html&quot; l gen true for 
199453047   http://www.msn.com   true for &quot;http://www.msn.com&quot; r (cz 1 lz 1 n
199453120   http://msn.com   true for &quot;http://msn.com&quot; r (cz 1 lz 1 n
199453189   http://stb.msn.com   true for &quot;http://stb.msn.com&quot; r (cz 1 lz 1 n
199453396   http://www.rsac.org/ratingsv01.html z 1 vz 1) &quot;http://www.rsac.org/ratingsv01.html&quot; l gen true for 
199453645   http://stc.msn.com   true for &quot;http://stc.msn.com&quot; r (n 0 s 0 v 0
199453709   http://stj.msn.com   true for &quot;http://stj.msn.com&quot; r (n 0 s 0 v 0

Version information:

# BULK_EXTRACTOR-Version: 1.5.5 ($Rev: 10844 $)
# Feature-Recorder: url
# Feature-File-Version: 1.1

Please let me know if I can provide you with any better information.

process_aff appears to be ignoring pagesize

It appears that process_aff::get_sbuf() ignores the pagesize. I don't think that this can all be rewritten to use pread because process_dir needs to be able to return an sbuf for an iterator.

scan_gps has a bad recorder

One of the recorders writing to gps.txt is not putting in the MD5 as the feature in the feature file. This is evidenced when processing the NPS 2TB drive.

Need an Icon

Integrated handling of magic numbers

Scanners should be able to register magic numbers that they can handle. Then other scanners like scan_xor could look for the magic numbers and only xor when they find them... Useful?

Bugfix: cda_tool.py variable name

Line 150 "for fn in fnames:"
should read "for fn in fns:"
to match lines 147,149

otherwise this script fails

java_gui/src/image/ImageReaderManager.java is missing

ImageReaderManager.java is missing in 3df29f7, which is HEAD at present:

[uckelman@scylla java_gui]$ make
make: *** No rule to make target `src/image/ImageReaderManager.java', needed by `BEViewer.jar'. Stop.

Possible typo

This appears to be a typo in the script, should it be getpwuid? It is listed as getwpuid and was missing when the ./configure command was run for Bulk_extractor.

yyFlexLexer warning needs resolved

The following warning should be fixed:

: void yyFlexLexer::LexerError( yyconst char msg[] )
:1662:6: warning: function might be candidate for attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]

bulk_extractor wordlist should be rewritten to use la-strings.

bulk_extractor wordlist currently checks if a byte isprint(ch) && ch!=' ' && ch<128.

An improvement to this would be to support encodings such as UTF-8, UTF-16 and UTF-32, possibly as options specified by the user. The words should then be converted to a single encoding (UTF-8?) and then split/deduped, for possible conversion and use by the target application.

cda_tool.py should not use python3.2 but python3

The frist line reads
#!/usr/bin/env python3.2

I suggest to change it to
#!/usr/bin/env python3
so it works with the current python 3.x, too.
Or is there a reason that makes it not working with 3.3?

finish random sampling implement for E01 and AFF files

scan_elf improvement

If name="" and type="", then the segment is not valid. Stop there.

Custom LEX can't be set during configure

I am trying to ./configure with LEX=/usr/loca/bin/flex (this is needed because /usr/bin/flex doesn't support -R but /usr/local/bin/flex does)

But it is not possible because of those 3 lines in configure.ac:

if test "$LEX" != flex; then
AC_MSG_ERROR([flex not installed; required for compiling regular expressions. Try 'apt-get install flex' or 'yum install flex' or 'port install flex' or whatever package manager you happen to be using....])
fi

So I get the following error:
configure: error: flex not installed; required for compiling regular expressions. Try 'apt-get install flex' or 'yum install flex' or 'port install flex' or whatever package manager you happen to be using....

SHA-1 support

Currently BE uses MD5 as a universal hash. There should be a flag allowing other hash algorithms to be used and reported. The hash in use should be evidenced in the feature files. Also support SHA-3/128, which would be the first 128 bits of SHA-3?

Create Mach-o scanner for Apple binaries.

We need a decoder for macho (Apple) object files.

bootstrap.sh not present in bulk_extractor-1.4.1.tar.gz

The README references bootstrap.sh for OS X but that file isn't present in the distribution.

configure/make builds a working version, so this isn't a major issue.

-David

Unable to compile because of scan_exiv2

Fix hexdigest in scan_exiv2.cpp

Implement PHASE_THREAD_AFTER_SCAN

be13_api/pcap_fake.cpp:2:21: error: tcpflow.h: No such file or directory

I downloaded both bulk_extractor and tcpflow via git. I built and installed tcpflow and am trying to build bulk_extractor but run into the above error. I could hardwire a fix, but I'd like to get this solved properly.

If I copy tcpflow/src/tcpflow.h to /usr/local/include the compile throws this error:

In file included from be13_api/pcap_fake.cpp:2:
/usr/local/include/tcpflow.h:206: error: conflicting declaration ‘typedef size_t socklen_t’

-David

Separate right-side context for scan_acct and scan_email

installer not working

https://github.com/simsong/bulk_extractor/wiki/BEViewer

there is also no MSI package or JAR fileand the current exe fails starting (some jvm error)

User Plugins

As of Version 1.4.4, a user defined Plugin can be loaded only by giving a plugin directory via command line option '-P'. I would appreciate an environment Variable a la PATH (something like BE_PATH), in order to keep the command line short.

Further, BEViewer 1.4.4 can't show the content of a path containing a component belonging to a user defined recursive plugin, as the -P option is not given in the underlying call to bulk_extractor. An environment Variable would solve this issue, too.

The following patch in the source code of bulk_extractor 1.4.4 helps me as a temporary solution. Would be nice if it would be fixed in the next release:

diff -r bulk_extractor-1.4.4/src/main.cpp source/bulk_extractor-patched/src/main.cpp
809a810,820
>     // >>> Patch
>     // add to plugin_path: /usr/local/lib/bulk_extractor:/usr/lib/bulk_extractor:.
>     {
>       const char* p;
>       struct stat s;
>       p="/usr/local/lib/bulk_extractor"; if(stat(p, &s)==0) scanner_dirs.push_back(p);
>       p="/usr/lib/bulk_extractor";       if(stat(p, &s)==0) scanner_dirs.push_back(p);
>       p=".";                                                scanner_dirs.push_back(p);
>     }
>     // <<< Patch
> 
diff -r bulk_extractor-1.4.4/src/be13_api/plugin.cpp source/bulk_extractor-patched/src/be13_api/plugin.cpp
218c218,219
<     std::cout << "Loading: " << fn << " (" << func_name << ")\n";

---
>     // >>> Patch: The following output would confuse BEViewer.
>     // std::cout << "Loading: " << fn << " (" << func_name << ")\n";

encoder_report should have a max_features per file to analyze

Either encoder_report or the bulk_extractor_reader should allow the setting of a maximum number of features per feature file to process to make it faster to debug.

BEViewer shows escape codes rather than just text

Features encoded in UTF-16 encoding show up with escape codes such as "\x00". They need to be displayed as normal characters.

allow feature files to include ?arg=val in forensic path.

The idea is to tack on these fields to the forensic path as URL
query string parameters, e.g., ?re=foo&enc=UTF-8. We'd obviously need
to work out the details about escaping, etc., but there are a few
things to like about this. First, URLs are cool and one can easily
imagine some future web service for exposing bulk_extractor output,
and that's not a bad way to integrate disparate enterprise systems.
Second, the scheme is idempotent, so if you ran a slightly different
set of patterns at a later time, the patterns that remained the same
would generate the same forensic paths. Third, the query parameters
act as annotations to the location of the data.

The main cons are that it reads kind of ugly, and will be a bit harder
to deal with in quick-and-dirty scripts.

add dfxml as a submodule; migrate to that DFXML generator

exiv2 doesn't compile

I started fixing:

--- ./configure.ac.orig 2013-07-12 01:19:20.000000000 +0000
+++ ./configure.ac      2013-07-13 07:43:24.000000000 +0000
@@ -518,8 +518,8 @@
   fi
 fi
 if test x"$exiv2" == x"yes" ; then
-  AC_CHECK_HEADERS([exiv2/image.hpp exiv2/exif.hpp exiv2/error.hpp])
   AC_LANG_PUSH(C++)
+  AC_CHECK_HEADERS([exiv2/image.hpp exiv2/exif.hpp exiv2/error.hpp])
     AC_TRY_COMPILE([#include <exiv2/image.hpp>
                    #include <exiv2/exif.hpp>
                     #include <exiv2/error.hpp>],
--- ./src/scan_exiv2.cpp.orig   2013-05-29 01:03:05.000000000 +0000
+++ ./src/scan_exiv2.cpp        2013-07-13 07:45:01.000000000 +0000
@@ -7,6 +7,7 @@

 #include "config.h"
 #include "bulk_extractor_i.h"
+#include "be13_api/utils.h"

 #include <stdlib.h>
 #include <string.h>
@@ -101,7 +102,7 @@
 void scan_exiv2(const class scanner_params &sp,const recursion_control_block &rcb)
 {
     assert(sp.sp_version==scanner_params::CURRENT_SP_VERSION);
-    if(sp.phase==scanner_params::startup){
+    if(sp.phase==scanner_params::PHASE_STARTUP){
         assert(sp.info->si_version==scanner_info::CURRENT_SI_VERSION);
        sp.info->name  = "exiv2";
         sp.info->author         = "Simson L. Garfinkel";
@@ -112,8 +113,8 @@
        sp.info->flags = scanner_info::SCANNER_DISABLED; // disabled because we have be_exif
        return;
     }
-    if(sp.phase==scanner_params::shutdown) return;
-    if(sp.phase==scanner_params::scan){
+    if(sp.phase==scanner_params::PHASE_SHUTDOWN) return;
+    if(sp.phase==scanner_params::PHASE_SCAN){

        const sbuf_t &sbuf = sp.sbuf;
        feature_recorder *exif_recorder = sp.fs.get_name("exif");

But now I have other issues:

scan_exiv2.cpp: In function 'void scan_exiv2(const scanner_params&, const recursion_control_block&)':
scan_exiv2.cpp:155: error: 'be_hash' was not declared in this scope
scan_exiv2.cpp:186: error: 'xml' is not a class or namespace

lightgrep

I'm getting "error while loading shared libraries: liblightgrep.so.0: cannot open shared object file: No such file or directory" when trying to run bulk extractor. I have lightgrep installed,and was hoping to run bulk extractor with it. This is from a pull made today.

Thanks

User plugins (continued)

Thanks for integrating my suggestions (issue #53).

However, the second part of the patch concerning line 218 of file bulk_extractor-1.4.4/src/be13_api/plugin.cpp was not included, yet.

The output information of this line should go to a log-file (if at all) rather than to cout. Otherwise, BEViewer is not able to show any image data, any more, because the underlying call ''bulk_extractor -p -http ..." will be polluted by this logging information and thus it is no clean http code any more...

fname use after free in process_ewf::open

Hi,

In process_ewf::open, fname is being freed immediately before having being used.
Patch below seems to fix the problem

--- ./src/image_process.h.orig  2014-01-15 15:00:06.000000000 +0000
+++ ./src/image_process.h       2014-06-09 14:15:54.000000000 +0000
@@ -128,7 +128,7 @@
     virtual int open()=0;                                  /* open; return 0 if successful */
     virtual int pread(uint8_t *,size_t bytes,int64_t offset) const =0;     /* read */
     virtual int64_t image_size() const=0;
-    virtual std::string image_fname() const { return image_fname_;}
+    virtual const std::string &image_fname() const { return image_fname_;}

     /* iterator support; these virtual functions are called by iterator through (*myimage) */
     virtual image_process::iterator begin() const =0;

bulk_extractor scan_flexdemo error

When trying to run bulk_extractor (1.5.5) in plugins directory, it throws an error:
"bulk_extractor: symbol lookup error: ./scan_flexdemo.so: undefined symbol: _ZN7beregexC1Esi"