dionach / panhunt Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 74.0 16.08 MB

PANhunt searches for credit card numbers (PANs) in directories.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

panhunt's People

Contributors

Stargazers

Watchers

panhunt's Issues

Error when scanning pst file

Hi there,
I appear to be getting the following error when trying to scan a folder with pst file. Is there anything else required to be able to successfully scan a pst?

File "pst.py", line 1185, in init
class EntryID:
struct.error: unpack requires a string argument of length 24
Failed to execute script panhunt

Thanks :-)

Feature Request: Multi-threading

I know this may be a non-trivial request, but it would be great if the program could use more than one CPU.
Obviously one could run multiple commands on different paths (which is what i'm doing right now), but that is still not as efficient as it could be.
It would also be good to have an option to limit how many CPUs it uses so you can control performance a bit.

Thanks!

python 3.8.5 compatibility

is PAN hunt script conceived to work only with python 2.7?

Have a beginning of masked PAN output

Industry standard for masking pan is to retain max first 6 and last 4 digits.
self.pan[:6]+re.sub('\d','.',self.pan[6:-4]) + self.pan[-4:]
https://www.pcicomplianceguide.org/whats-the-best-practice-for-masking-or-truncating-pan/
Better is to show less.

Basically what I would like is to have a beginning and end of the found PAN so I can better search for it.
So first 2 last 4 would be adequate as well:
self.pan[:2]+re.sub('\d','.',self.pan[2:-4]) + self.pan[-4:]

I also replaced * by . so it's directly useable as a regexp for searching.

nameid_guidstream not accessible from outside

I'd like to see nameid guids, but nameid.wGuid is just the index, and 'nameid_guidstream' is not accessible from the outside. seems useful to interpret wGuid in (1,2) as well. what I currently use:

@@ -1495,6 +1495,14 @@ class Messaging:
             if nameid.N == 1:
                 name_len = struct.unpack('I', nameid_stringstream[nameid.dwPropertyID:nameid.dwPropertyID+4])[0]
                 nameid.name = nameid_stringstream[nameid.dwPropertyID+4:nameid.dwPropertyID+4+name_len].decode('utf-16-le') # unicode
+            if nameid.wGuid == 0:
+                nameid.guid = None
+            elif nameid.wGuid == 1: # PS_MAPI
+                nameid.guid = '(\x03\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F'
+            elif nameid.wGuid == 2: # PS_PUBLIC_STRINGS
+                nameid.guid = ')\x03\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F'
+            else:
+                nameid.guid = nameid_guidstream[16*(nameid.wGuid-3):16*(nameid.wGuid-2)]

large/ish folders broken

thanks for the very nicely written pst.py! I was building something similar, until I found this. I would love to help push this as "python-pst" into for example debian, to give it a bit more visibility..

I found a pretty serious bug though (just one for now), which causes large folders to appear empty. the following makes the problem obvious ('bytes' doesn't change!), and fixes it for my current test case (no mail in large inbox):

bth_intermediate = bth_working_stack.pop()
+bytes = hn.get_hid_data(bth_intermediate.hidNextLevel)

cheers, thanks again.

add linux gzip (gz) support

$SUBJ

regex exclusion & exclusion based on prior string

I found that on Windows Server 2019 almost every user folder has a few false-positive detection.

C:\Users__\Local Settings\Packages\Microsoft.Windows.Cortana_cw5n1h2txyewy\

Is there any possibility to exclude detections based on a regular expression, or can I only exclude "C:\Users"?

Also would it be possible to exclude based on what comes before the PAN? - e.g. if there is "messageID:" in the same line prior to the falsely detected PAN

Operation not permitted

error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted

"search" default in panhunt.ini seems to be ignored

I changed panhunt.ini to scan D:\ by default by changing the first line under [DEFAULT] from:
#search = C:\
to:
search = D:\
but it still seems to be scanning drive C:
Using the -s d:\ command line, it works fine.

datetime limitation (Y10K)

this may be a bit silly, but for MAPI dates (PTypTime) beyond Y10K, which are 'valid' on the MAPI side, though of course probably not very meaningful, pst.py will now raise a PSTException in the 'get_time' method (year out of range), probably meaning some other steps are skipped when processing the respective message.

in our version, we just skip the intermediate datetime representation (as it's MAPI-to-MAPI), but you might want to silently convert such dates to the year 9999, or something, to avoid skipping steps because of the exception? (there's nothing really wrong on the MAPI side)

Exclude temp files that office opens

Thanks for the great program!

When I run a scan on my computer, I get some errors on the temp files that office opens, for example:
ERROR Invalid ZIP file on C:\xxxxxx\~$LDAP OPERA_GROUP AND USERS.xlsx:0
ERROR Invalid ZIP file on C:\xxxxxx\xxxx\~$Schema 2 Opera user permission.csv.xlsx
ERROR Invalid ZIP file on C:\xxxxxx\xxxx\W10\~$usersyyyyy18072019.xlsx

Usually these files just list which user has the file open, so can probably be excluded from scanning as it's very unlikely they will store any of the original files' data inside.

Error when i try work with files that writes syslog

Traceback (most recent call last): | ETA: 2:15:35 PANs:0
File "panhunt.py", line 251, in
total_files_searched, pans_found, all_files = hunt_pans()
File "panhunt.py", line 196, in hunt_pans
total_docs, doc_pans_found = filehunt.find_all_regexs_in_files([afile for afile in all_files if not afile.errors and afile.type in ('TEXT','ZIP','SPECIAL')], pan_regexs, search_extensions, 'PAN', gauge_update_function)
File "/home/kochetkov/PANhunt-master/filehunt.py", line 326, in find_all_regexs_in_files
matches = afile.check_regexs(regexs, search_extensions)
File "/home/kochetkov/PANhunt-master/filehunt.py", line 100, in check_regexs
except WindowsError:
NameError: global name 'WindowsError' is not defined

is this still being supported?

I am looking at this and it seems like it might be something worth trying. I just want to be sure that there is some support still going for this and if so I might make a few commits myself to this.

Is this still being supported?

cannot access submessages

the following allows for easy access to submessages (embedded messages), iow attachments of type ATTACH_EMBEDDED_MSG. so if we have a property of type PR_ATTACH_DATA_OBJ, we can just call pst.Message, passing the parent message and the property value (which is now the subnode nid), as follows, to create a message object that we can process further (via submessage.pc.props etc. ..)

current calling code:

 subnode_nid = struct.unpack('I', prop.value)[0]
 submessage = pst.Message(subnode_nid, self.ltp, self.nbd, parent_message)

current change to make this work:

diff --git a/ECtools/pst/kopano_pst/pst.py b/ECtools/pst/kopano_pst/pst.py
index 9cf4a10..00856c3 100644
--- a/ECtools/pst/kopano_pst/pst.py
+++ b/ECtools/pst/kopano_pst/pst.py
@@ -795,7 +795,7 @@ class PType:
         elif self.ptype == PTypeEnum.PtypNull:
             return None
         elif self.ptype == PTypeEnum.PtypObject:
-            return bytes
+            return bytes[:4]
         else:
             raise PSTException('Invalid PTypeEnum for value %s ' % self.ptype)

@@ -1149,7 +1149,7 @@ class LTP:
             PTypeEnum.PtypMultipleBinary:PType(PTypeEnum.PtypMultipleBinary, 2, False, True),
             PTypeEnum.PtypUnspecified:PType(PTypeEnum.PtypUnspecified, 0, False, False),
             PTypeEnum.PtypNull:PType(PTypeEnum.PtypNull, 0, False, False),
-            PTypeEnum.PtypObject:PType(PTypeEnum.PtypObject, 0, False, False)
+            PTypeEnum.PtypObject:PType(PTypeEnum.PtypObject, 4, False, True)
         }


@@ -1347,12 +1347,18 @@ class Message:
     afStorage = 0x06


-    def __init__(self, nid, ltp):
+    def __init__(self, nid, ltp, nbd=None, parent_message=None):

-        if nid.nidType != NID.NID_TYPE_NORMAL_MESSAGE:
-            raise PSTException('Invalid Message NID Type: %s' % nid_pc.nidType)
         self.ltp = ltp
-        self.pc = ltp.get_pc_by_nid(nid)
+        if parent_message:
+            subnode = parent_message.pc.hn.subnodes[nid]
+            datas = nbd.fetch_all_block_data(subnode.bidData)
+            hn = HN(subnode, ltp, datas)
+            self.pc = PC(hn)
+        else:
+            if nid.nidType != NID.NID_TYPE_NORMAL_MESSAGE:
+                raise PSTException('Invalid Message NID Type: %s' % nid_pc.nidType)
+            self.pc = ltp.get_pc_by_nid(nid)
         self.MessageClass = self.pc.getval(PropIdEnum.PidTagMessageClassW)
         self.Subject = ltp.strip_SubjectPrefix(self.pc.getval(PropIdEnum.PidTagSubjectW))
         self.ClientSubmitTime = self.pc.getval(PropIdEnum.PidTagClientSubmitTime)
'''

UnicodeDecodeError: 'utf16' codec can't decode bytes

When scanning a fileserver, it didn't like one of the .PST files:

Traceback (most recent call last):|                     | ETA:  --:--:-- PANs:0
  File "panhunt.py", line 251, in <module>
  File "panhunt.py", line 198, in hunt_pans
  File "filehunt.py", line 349, in find_all_regexs_in_psts
  File "filehunt.py", line 139, in check_pst_regexs
  File "pst.py", line 1989, in get_total_attachment_count
  File "pst.py", line 1924, in message_generator
  File "pst.py", line 1329, in __init__
  File "pst.py", line 1135, in get_pc_by_nid
  File "pst.py", line 905, in __init__
  File "pst.py", line 627, in __init__
  File "pst.py", line 707, in value
  File "encodings\utf_16_le.py", line 16, in decode
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 508-509: unexpected end of data
Failed to execute script panhunt

Not quite sure why, could it be that the file changed during the scan?

Unable to filter to just PSTs

I can't seem to figure out how to filter down to just PST files for the Windows executable to scan. I want to reduce the amount of scanning that the script has to do to a minimum as I am only concerned about PST files. Is there any way to do this? I've tried the -m flag but with just this the script will still scan all files.

Compiled Win exe won't run

I compiled the script as described and when I run the exe I get the following error:

Traceback (most recent call last):
File "panhunt.py", line 251, in < module >
File "panhunt.py", line 193, in hunt_pans
File "filehunt.py", line 263, in find_all_files_in_directory
AttributeError: 'module' object has no attribute 'Percentage'
Failed to execute script panhunt

I have been unable to determine what the issue is. This is an awesome tool that could be very useful for me. I appreciate you sharing it and any help getting it to work if you can. -Matt

incorrectly encoded string property raises exception

one of our customers provided a PST file which contains an incorrectly encoded PR_BODY property somewhere. looking at it, it contains the following byte sequence:

0x3d 0xd8 0xd 0x0

so a surrogate pair is started, but the next 2 bytes to complete it aren't there.. or something. it seems useful for your version to at least add errors='ignore' to the decode call, so the whole process will just go on.

elif self.ptype == PTypeEnum.PtypString:

       return bytes.decode('utf-16-le') # unicode

      return bytes.decode('utf-16-le', errors='ignore') # unicode

dionach / panhunt Goto Github PK

panhunt's People

Contributors

Stargazers

Watchers

Forkers

panhunt's Issues

Recommend Projects

Recommend Topics

Recommend Org