Git Product home page Git Product logo

panhunt's People

Contributors

bli-dn avatar jelly avatar rbsec avatar srepmub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

panhunt's Issues

Error when scanning pst file

Hi there,
I appear to be getting the following error when trying to scan a folder with pst file. Is there anything else required to be able to successfully scan a pst?

File "pst.py", line 1185, in init
class EntryID:
struct.error: unpack requires a string argument of length 24
Failed to execute script panhunt

Thanks :-)

Feature Request: Multi-threading

I know this may be a non-trivial request, but it would be great if the program could use more than one CPU.
Obviously one could run multiple commands on different paths (which is what i'm doing right now), but that is still not as efficient as it could be.
It would also be good to have an option to limit how many CPUs it uses so you can control performance a bit.

Thanks!

Have a beginning of masked PAN output

Industry standard for masking pan is to retain max first 6 and last 4 digits.
self.pan[:6]+re.sub('\d','.',self.pan[6:-4]) + self.pan[-4:]
https://www.pcicomplianceguide.org/whats-the-best-practice-for-masking-or-truncating-pan/
Better is to show less.

Basically what I would like is to have a beginning and end of the found PAN so I can better search for it.
So first 2 last 4 would be adequate as well:
self.pan[:2]+re.sub('\d','.',self.pan[2:-4]) + self.pan[-4:]

I also replaced * by . so it's directly useable as a regexp for searching.

nameid_guidstream not accessible from outside

I'd like to see nameid guids, but nameid.wGuid is just the index, and 'nameid_guidstream' is not accessible from the outside. seems useful to interpret wGuid in (1,2) as well. what I currently use:

@@ -1495,6 +1495,14 @@ class Messaging:
             if nameid.N == 1:
                 name_len = struct.unpack('I', nameid_stringstream[nameid.dwPropertyID:nameid.dwPropertyID+4])[0]
                 nameid.name = nameid_stringstream[nameid.dwPropertyID+4:nameid.dwPropertyID+4+name_len].decode('utf-16-le') # unicode
+            if nameid.wGuid == 0:
+                nameid.guid = None
+            elif nameid.wGuid == 1: # PS_MAPI
+                nameid.guid = '(\x03\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F'
+            elif nameid.wGuid == 2: # PS_PUBLIC_STRINGS
+                nameid.guid = ')\x03\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F'
+            else:
+                nameid.guid = nameid_guidstream[16*(nameid.wGuid-3):16*(nameid.wGuid-2)]

large/ish folders broken

thanks for the very nicely written pst.py! I was building something similar, until I found this. I would love to help push this as "python-pst" into for example debian, to give it a bit more visibility..

I found a pretty serious bug though (just one for now), which causes large folders to appear empty. the following makes the problem obvious ('bytes' doesn't change!), and fixes it for my current test case (no mail in large inbox):

bth_intermediate = bth_working_stack.pop()
+bytes = hn.get_hid_data(bth_intermediate.hidNextLevel)

cheers, thanks again.

regex exclusion & exclusion based on prior string

I found that on Windows Server 2019 almost every user folder has a few false-positive detection.

C:\Users__\Local Settings\Packages\Microsoft.Windows.Cortana_cw5n1h2txyewy\

Is there any possibility to exclude detections based on a regular expression, or can I only exclude "C:\Users"?

Also would it be possible to exclude based on what comes before the PAN? - e.g. if there is "messageID:" in the same line prior to the falsely detected PAN

Operation not permitted

error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted

"search" default in panhunt.ini seems to be ignored

I changed panhunt.ini to scan D:\ by default by changing the first line under [DEFAULT] from:
#search = C:\
to:
search = D:\
but it still seems to be scanning drive C:
Using the -s d:\ command line, it works fine.

datetime limitation (Y10K)

this may be a bit silly, but for MAPI dates (PTypTime) beyond Y10K, which are 'valid' on the MAPI side, though of course probably not very meaningful, pst.py will now raise a PSTException in the 'get_time' method (year out of range), probably meaning some other steps are skipped when processing the respective message.

in our version, we just skip the intermediate datetime representation (as it's MAPI-to-MAPI), but you might want to silently convert such dates to the year 9999, or something, to avoid skipping steps because of the exception? (there's nothing really wrong on the MAPI side)

Exclude temp files that office opens

Thanks for the great program!

When I run a scan on my computer, I get some errors on the temp files that office opens, for example:
ERROR Invalid ZIP file on C:\xxxxxx\~$LDAP OPERA_GROUP AND USERS.xlsx:0
ERROR Invalid ZIP file on C:\xxxxxx\xxxx\~$Schema 2 Opera user permission.csv.xlsx
ERROR Invalid ZIP file on C:\xxxxxx\xxxx\W10\~$usersyyyyy18072019.xlsx

Usually these files just list which user has the file open, so can probably be excluded from scanning as it's very unlikely they will store any of the original files' data inside.

Error when i try work with files that writes syslog

Traceback (most recent call last): | ETA: 2:15:35 PANs:0
File "panhunt.py", line 251, in
total_files_searched, pans_found, all_files = hunt_pans()
File "panhunt.py", line 196, in hunt_pans
total_docs, doc_pans_found = filehunt.find_all_regexs_in_files([afile for afile in all_files if not afile.errors and afile.type in ('TEXT','ZIP','SPECIAL')], pan_regexs, search_extensions, 'PAN', gauge_update_function)
File "/home/kochetkov/PANhunt-master/filehunt.py", line 326, in find_all_regexs_in_files
matches = afile.check_regexs(regexs, search_extensions)
File "/home/kochetkov/PANhunt-master/filehunt.py", line 100, in check_regexs
except WindowsError:
NameError: global name 'WindowsError' is not defined

is this still being supported?

I am looking at this and it seems like it might be something worth trying. I just want to be sure that there is some support still going for this and if so I might make a few commits myself to this.

Is this still being supported?

cannot access submessages

the following allows for easy access to submessages (embedded messages), iow attachments of type ATTACH_EMBEDDED_MSG. so if we have a property of type PR_ATTACH_DATA_OBJ, we can just call pst.Message, passing the parent message and the property value (which is now the subnode nid), as follows, to create a message object that we can process further (via submessage.pc.props etc. ..)

current calling code:

 subnode_nid = struct.unpack('I', prop.value)[0]
 submessage = pst.Message(subnode_nid, self.ltp, self.nbd, parent_message)

current change to make this work:

diff --git a/ECtools/pst/kopano_pst/pst.py b/ECtools/pst/kopano_pst/pst.py
index 9cf4a10..00856c3 100644
--- a/ECtools/pst/kopano_pst/pst.py
+++ b/ECtools/pst/kopano_pst/pst.py
@@ -795,7 +795,7 @@ class PType:
         elif self.ptype == PTypeEnum.PtypNull:
             return None
         elif self.ptype == PTypeEnum.PtypObject:
-            return bytes
+            return bytes[:4]
         else:
             raise PSTException('Invalid PTypeEnum for value %s ' % self.ptype)

@@ -1149,7 +1149,7 @@ class LTP:
             PTypeEnum.PtypMultipleBinary:PType(PTypeEnum.PtypMultipleBinary, 2, False, True),
             PTypeEnum.PtypUnspecified:PType(PTypeEnum.PtypUnspecified, 0, False, False),
             PTypeEnum.PtypNull:PType(PTypeEnum.PtypNull, 0, False, False),
-            PTypeEnum.PtypObject:PType(PTypeEnum.PtypObject, 0, False, False)
+            PTypeEnum.PtypObject:PType(PTypeEnum.PtypObject, 4, False, True)
         }


@@ -1347,12 +1347,18 @@ class Message:
     afStorage = 0x06


-    def __init__(self, nid, ltp):
+    def __init__(self, nid, ltp, nbd=None, parent_message=None):

-        if nid.nidType != NID.NID_TYPE_NORMAL_MESSAGE:
-            raise PSTException('Invalid Message NID Type: %s' % nid_pc.nidType)
         self.ltp = ltp
-        self.pc = ltp.get_pc_by_nid(nid)
+        if parent_message:
+            subnode = parent_message.pc.hn.subnodes[nid]
+            datas = nbd.fetch_all_block_data(subnode.bidData)
+            hn = HN(subnode, ltp, datas)
+            self.pc = PC(hn)
+        else:
+            if nid.nidType != NID.NID_TYPE_NORMAL_MESSAGE:
+                raise PSTException('Invalid Message NID Type: %s' % nid_pc.nidType)
+            self.pc = ltp.get_pc_by_nid(nid)
         self.MessageClass = self.pc.getval(PropIdEnum.PidTagMessageClassW)
         self.Subject = ltp.strip_SubjectPrefix(self.pc.getval(PropIdEnum.PidTagSubjectW))
         self.ClientSubmitTime = self.pc.getval(PropIdEnum.PidTagClientSubmitTime)
'''

UnicodeDecodeError: 'utf16' codec can't decode bytes

When scanning a fileserver, it didn't like one of the .PST files:

Traceback (most recent call last):|                     | ETA:  --:--:-- PANs:0
  File "panhunt.py", line 251, in <module>
  File "panhunt.py", line 198, in hunt_pans
  File "filehunt.py", line 349, in find_all_regexs_in_psts
  File "filehunt.py", line 139, in check_pst_regexs
  File "pst.py", line 1989, in get_total_attachment_count
  File "pst.py", line 1924, in message_generator
  File "pst.py", line 1329, in __init__
  File "pst.py", line 1135, in get_pc_by_nid
  File "pst.py", line 905, in __init__
  File "pst.py", line 627, in __init__
  File "pst.py", line 707, in value
  File "encodings\utf_16_le.py", line 16, in decode
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 508-509: unexpected end of data
Failed to execute script panhunt

Not quite sure why, could it be that the file changed during the scan?

Unable to filter to just PSTs

I can't seem to figure out how to filter down to just PST files for the Windows executable to scan. I want to reduce the amount of scanning that the script has to do to a minimum as I am only concerned about PST files. Is there any way to do this? I've tried the -m flag but with just this the script will still scan all files.

Compiled Win exe won't run

I compiled the script as described and when I run the exe I get the following error:

Traceback (most recent call last):
File "panhunt.py", line 251, in < module >
File "panhunt.py", line 193, in hunt_pans
File "filehunt.py", line 263, in find_all_files_in_directory
AttributeError: 'module' object has no attribute 'Percentage'
Failed to execute script panhunt

I have been unable to determine what the issue is. This is an awesome tool that could be very useful for me. I appreciate you sharing it and any help getting it to work if you can. -Matt

incorrectly encoded string property raises exception

one of our customers provided a PST file which contains an incorrectly encoded PR_BODY property somewhere. looking at it, it contains the following byte sequence:

0x3d 0xd8 0xd 0x0

so a surrogate pair is started, but the next 2 bytes to complete it aren't there.. or something. it seems useful for your version to at least add errors='ignore' to the decode call, so the whole process will just go on.

elif self.ptype == PTypeEnum.PtypString:

  •        return bytes.decode('utf-16-le') # unicode
    
  •       return bytes.decode('utf-16-le', errors='ignore') # unicode
    

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.